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(57) Abstract: EKG sensors 
((150) are placed on a patient 
(140) to receive electrocardiogram 
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as pacemaker signals, QRS complex 
signals, and irregular oscillatory 
signals that suggest an arrhythmia 
condition. A computing module 
(120) uses independent component 
analysis to separate the recorded 
EKG signals. The separated signals 
are displayed to help physicians 
to analyze heart conditions and 
to identify probably locations of 
abnormal heart conditions. At least 
a portion of the separated signals 
can be further displayed in a chaos 
phase space portrait to help detect 
abnormality in heart conditions. 



wo 03/003905 A2 I mil IDII1II H Hill! IIIII III! I II 111 1IDI IIIII Dill IIHI DIJ IIIIID till UIJ III! 



Eurasian patent (AM, AZ, BY, KG, KZ, MD, RU, TJ, TM), 
European patent (AT, BE, BG, CH, CY, CZ, DE, DK, EE, 
ES, FI, FR, GB, GR, IE, IT, LU, MC, NL, PT, SE, SK, 
TR), OAPI patent (BF, BJ, CF, CG, CI, CM, GA, GN, GQ, 
GW, ML, MR, NE, SN, TD, TG). 



For two-lelter codes and other abbreviations, refer to the "Guid- 
ance Notes on Codes and Abbreviations" appearing at the begin- 
ning of each regular issue of the PCT Gazette. 



Published: 

— without international search report and to be republished 
upon receipt of that report 



f 



WO 03/003905 PCT/US02/21277 
SYSTEM AND METHOD FOR SEPARATING CARDIAC SIGNALS 



Background of the Invention 

Field of the Invention 

The present invention relates to medical devices for recording cardiac signals and 

5 separating the recorded cardiac signals. 
Description of the Related Art 

Electrocardiogram (EKG) recording is a valuable tool for physicians to study patient heart 
conditions. In a typical 12-lead arrangement, up to 12 sensors are placed on a subject's chest or 
abdomen and limbs to record the electric signals from the beating heart. Each sensor, along with a 

10 reference electrode, form a separate channel that produces an individual signal. The signals from 
the different sensors are recorded on an EKG machine as different channels. The sensors are 
usually unipolar or bipolar electrodes or other devices suitable for measuring the electrical potential 
on the surface of a human body. Since different parts of the heart, such as the atria and ventricles, 
produce different spatial and temporal patterns of electrical activity on the body surface, the signals 

15 recorded on the EKG machine are useful for analyzing how well individual parts of the heart are 
functioning. 

A typical heartbeat signal has several well-characterized components. The first component 
is a small hump in the beginning of a heartbeat called the "P-Wave". This signal is produced by the 
right and left atria. There is a flat area after the P-Wave which is part of what is called the PR 

20 Interval. During the PR interval the electrical signal is traveling through the atrio-ventricular node 
(AV) node. The next large spike in the heartbeat signal is called the "QRS Complex." The QRS 
Complex is tall, spikey signal produced by the ventricles. Following the QRS complex is another 
smaller bump in the signal called the "T-Wave," which represents the electrical resetting of the 
ventricles in preparation for the next signal. When the heart beats continuously, the P-QRS-T waves 

25 repeat over and over. 

Many publications have described studying cardiac signals and detecting abnormal heart 
conditions. Sample publications include U.S. Patent Publication No. 20020052557; Podrid & 
Kowey, Cardiac Arrhythmia: Mechanisms, Diagnosis, and Management Lippincott Williams & 
Wilkins Publishers (2nd edition, August 15, 2001); Marriott & Conover, Advanced Concepts in 

30 Arrhythmias . Mosby Inc. (3nd edition,. January 15, 1998); and Josephson, M.E., Clinical Cardiac 
Electrophvsiology: Techniques and Interpretations , Lippincott Williams & Wilkins Publishers; 
ISBN (3rd edition, December 15, 2001). 

Unfortunately, although EKG signals have been studied for decades, they are difficult to 
assess because EKG signals recorded at the surface are mixtures of signals from multiple sources. 

35 Typically, it is relatively straightforward to measure the shape of the QRS complex since this signal 
is so strong. However, irregular shaped P-wave or T-wave signals, along with weak irregular 
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oscillatory signals that suggest a heart arrhythmia are often masked by large pacemaker signals, or 
the strong QRS complex signals. Thus, it can be very difficult to isolate small irregular oscillatory 
signals and to identify arrhythmia conditions. 

In addition, atrial and ventricular signals are sometimes undesirably superimposed over one 
5 another. In many cases, diagnosis of disease states requires these signals to be separated from one 
another. For example, it might be desirable to separate P wave signals from QRS complex signals, 
so that signals originating in an atrium are isolated from signals representing concurrent activities in 
the ventricle. 

In some practices the EKG signals are electronically "filtered" by excluding signals of 

10 certain frequencies. The signals are also "averaged" to remove largely random or asynchoronous 
data, which is assumed to the meaningless "noise." The filtering and averaging methods 
irreversibly eliminate portions of the recorded signals. In addition, it is not proven whether the 
more random data is truly "noise" and truly meaningless. It might be that the signals that are 
removed are indicative of a disease state in a patient. Another method as disclosed in U.S. Patent 

15 No. 6,308,094 entitled "System for prediction of cardiac arrhythmias" uses Karhunen Loeve 
Transformation to decompose or compress cardiac signals into elements that are deemed 
"significant." As a result the information that are deemed "insignificant" are lost. 

Compared to other signal separation applications, separating EKG recording signals 
presents additional challenges. For example, the sources are not always stationary since the heart 

20 chambers contract and expand during beating. Additionally, the activity of a single chamber may 
be mistaken for multiple sources because of the presence of moving waves of electrical activity 
across the heart. If electrodes are not securely attached to the patient, or if the patient moves (for 
example older patients may suffer from uncontrolled jittering), the movement of the electrodes also 
undesirably generates signals. In addition, multiple signals can be sensed by the EKG which are 

25 unrelated to the cardiac signature, such as myopotentials, i.e., electrical signals from muscles other 
than the heart. 

There has been disclosure of cardiac rhythm management systems that store of list of 
triggers. U.S. Patent No. 6,400,982 entitled "Cardiac rhythm management system with arrhythmia 
prediction and prevention" discloses such a system. If a trigger matches detected cardiac signals 
30 from a patient, the system calculates the probability of arrhythmia and activates a prevention 
therapy to the patient. However the cardiac signals are in fact mixtures of signals from multiple 
sources, and the signals that are important for arrhythmia detection can be masked by other signals. 
It is therefore desirable to separate the cardiac signals used in the cardiac rhythm management 
systems. 

35 Independent component analysis (ICA) is a technique for separating mixed source signals 

(components) which are presumably independent from each other. In its simplified form, 
independent component analysis operates a "un-mixing" matrix of weights on the mixed signals, for 
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example multiplying the matrix with the mixed signals, to produce separated signals. The weights 
are assigned initial values, and then adjusted to minimize information redundancy in the separated 
signals. Because this technique does not require information on the source of each signal, it is 
known as a "blind source separation" method. Blind separation problems refer to the idea of 
5 separating mixed signals that come from multiple independent sources. Although there are many 
ICA techniques currently known, most have evolved from the original work described in U.S. 
Patent No. 5,706,402 issued on January 6, 1998. Additional references of ICA and blind source 
separation can be found in, for example, A. J. Bell and TJ Sejnowski, Neural Computation 7:1 129- 
1159 (1995)); Te-Won Lee, Independent Component Analysis: Theory and Applications . Kluwer 

10 Academic Publishers, Boston, September 1998, Hyvarinen et al., Independent Component Analysis . 
1st edition (Wiley-Interscience, May 18, 2001); Mark Girolami, Self-Organizing Neural Networks: 
Independent Component Analysis and Blind Source Separation (Perspectives in Neural Computing) 
(Springer Verlag, September 1999); and Mark Girolami (Editor), Advances in Independent 
Component Analysis (Perspectives in Neural Computing) (Springer Verlag August 2000). Single 

15 value decomposition algorithms have been disclosed in Adaptive Filter Theory by Simon Haykin 
(Third Edition, Prentice-Hall (NJ), (1996). 

There has been suggestion to use chaos theory to analyze cardiac signals to detect abnormal 
heart conditions. Sample disclosures include U.S. Patent Nos. 5,439,004, 5,342,401, 5,447,520 and 
5,456,690; PCT application Nos. WO02/34123 and WO0224276; Smith et al. Electrical Alternans 

20 and Cardiac Electrical Instability. Circulation, Vol. 77, No. 1, pp. 110-121 (January 1988). Other 
approaches are disclosed in U.S. Patent No. 5,447,520 issued to Spano, et al. and U.S. Patent No. 
5,201,321 issued to Fulton. Chaos theory is defined as the study of complex nonlinear dynamic 
systems. Complex implies just that, nonlinear implies recursion and higher mathematical 
algorithms, and dynamic implies non-constant and non-periodic. Thus chaos theory is, very 

25 generally, the study of changing complex systems based on mathematical concepts of recursion, 
whether in the form of a recursive process or a set of differential equations modeling a physical 
system. 

When a bounded chaotic system has some kind of long-term pattern, but the pattern is not a 
simple periodic oscillation or orbit, then the system has a "Strange Attractor". If the system's 

30 behavior is plotted in a graph over an extended period patterns can be discovered that are not 
obvious in the short term. In addition, in these types of systems, no matter what the initial 
conditions are, usually the same pattern is found to emerge. The area for which this recurring 
pattern holds true is called the "basin of attraction" for the attractor. Chaos theory methods have 
been described in, for example, N. H. Packard, J. P. Crutchfield, J. Doyne Farmer, and R. S. Shaw, 

35 Geometry of a Time Series , Physical Review Letters, 47 (1980), p. 712; F. Takens, Detecting 
Strange Attractors in Turbulence in Lecture Notes in Mathematics 898, D. A. Rand and L. S. 
Young, eds., (Berlin: Springer-Verlag, 1981), p. 336; and J. P. Crutchfield, J. Doyne Farmer, N. H. 
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Packard, and R. S. Shaw, On Determining the Dimension of Chaotic Flows . Physica 3D, (1981), 
pp. 605-17. 

For all of these reasons, what is needed in the art is a system that can accurately separate 
medical signals from one another in order to diagnose disease states. 
5 Summary of the Invention 

The present application discloses systems and methods for using independent component 
analysis to determine the existence and location of anomalies such as arrhythmias of a heart. The 
disclosed systems and methods can be applied to suggest the location of atrial fibrillation, and to 
locate arrhythmogenic regions of a chamber of the heart using heart cycle signals measured from a 

10 body surface of the patient. Non-invasive localization of the ectopic origin allows focal treatment to 
be quickly targeted to effectively inhibit these complex arrhythmias without having to rely on 
widespread and time consuming sequential searches or on massively invasive simultaneous 
intracardiac sensor technique. The effective localization of these complex arrhythmias can be 
significantly enhanced by using independent component analysis to separate superimposed heart 

15 cycle signals originating from differing chambers or regions of the heart tissue. In addition, the 
signals that are separated by ICA are preferably also analyzed by plotting them on a chaos phase 
space portrait. 

One aspect of the invention relates to a medical system for separating cardiac signals. This 
aspect includes a receiving module to receive recorded cardiac signals from medical sensors, a 

20 computing module to separate the received signals using independent component analysis to 
produce separated signals, and a display module to display the separated signals. 

Another aspect of the invention relates to a method of detecting arrhythmia in a patient. 
The method includes placing EKG sensors on a patient to produce recorded EKG signals, sending 
the recorded signals to a computing module to separate the recorded signals into separated signals 

25 using independent component analysis, and reviewing a display of the separated signals to 
determine the existence of arrhythmia in the patient. In a preferred embodiment, each component 
of separated signals corresponds to a channel of recorded signals and its sensor location, therefore 
when the one or more components of separated signals that suggest arrhythmia are detected, the 
corresponding one or more sensor locations also suggest the location of arrhythmia. 

30 Yet another aspect of the invention relates to a cardiac rhythm management system. The 

system includes a cardiac signal recording module to record cardiac signals of a patient, a 
computing module to separate the recorded signals into separated signals using independent 
component analysis, and a detection module to detect or to predict an abnormal condition based on 
analyzing the separated signals. The system also includes a treatment module to treat the patient or 

35 a warning module to issue a warning when the abnormal condition is detected or predicted. 

Other aspects and embodiments of the invention are described below in the detailed 
description section or defined by the claims. 
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Brief Description of the Prongs 
FIGURE 1 is a diagram of a EKG system according to one embodiment of the invention. 
FIGURE 2 is a flowchart illustrating one embodiment of a process for separating cardiac 

signals. 

5 FIGURE 3 A is a sample chart of recorded EKG signals. 

FIGURE 3B is a sample chart of separated EKG signals. 

FIGURE 3C is a sample chart of one component of separated signals back projected on the 
recorded signals. 

FIGURE 4A is a chaos phase space portrait of three components of separated EKG signals 
10 of a healthy subject. 

FIGURE 4B is a chaos phase space portrait of three components of separated EKG signals 
of a subject with an abnormal heart condition. 

Detailed Description of the Preferred Embodiment 
Embodiments of the invention relate to a system and method for accurately separating 
15 medical signals in order to determine disease states in a patient. In one embodiment, the system 
analyzes EKG signals in order to determine whether a patient has a heart ailment or irregularity. As 
discussed in detail below, embodiments of the system utilize the techniques of independent 
component analysis to separate the medical signals from one another. 

In addition to the signal separation technique, embodiments of the invention also relate to 
20 systems and methods that first separate signals using ICA, and then perform an analysis on a 
specific isolated signal, or set of isolated signals, using a "chaos" analysis. As described earlier, 
Chaos theory (also called nonlinear dynamics) studies patterns that are not completely random, but 
cannot be determined by simple formulas. Because cardiac signals are typically non-random, but 
cannot be easily described by a simple formula, Chaos theory analysis as described below provides 
25 an effective tool to analyze these signals and determine disease states. 

Accordingly, once the signals are separated using ICA, they can be plotted to produce a 
chaos phase space portrait. By reviewing the patterns in the phase space portrait, for example 
reviewing the existence and location of one or more attractors, or comparing established health 
patterns and established abnormal patterns with the patterns of the patient, a user is able to assess 
30 the likelihood of abnormality in the signals, which indicate disease conditions in the patient. 

FIGURE 1 is a diagram of an EKG system that includes a computing module for signal 
separation according to one embodiment of the present invention. As shown in FIGURE 1, 
electrode sensors 150 are placed on the chest and limb of a patient 140 to record electric signals. 
The electrodes send the recorded signals to a receiving module 110 of the EKG system 100. After 
35 optionally performing signal amplification, analog-to-digital conversion or both, the receiving 
module 110 sends the received signals to a computing module 120 of the EKG system 100. The 
computing module 120 uses an independent component analysis method to separate the recorded 
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signals to produce separated signals. The independent component analysis method has been 
described in detail in the Appendix and below with respect to Figure 2. 

The computing module 120 can be implemented in hardware, software, or a combination of 
both. It can be located physically within the EKG system 100 or connected to the recorded signals 
5 received by the EKG system 100. A displaying module 130, which includes a printer or a monitor, 
displays the separated signals on paper or on screen. The displaying module 130 can be located 
within the EKG system 100 or connected to it. Optionally, the displaying module 130 also displays 
the recorded signals on paper or on screen. In one embodiment, the displaying module also 
displays some components of the separated signals in a chaos phase space portrait. 

10 In one embodiment, the EKG system 100 also includes a database (not shown) that stores 

recognized EKG signal triggers and corresponding diagnosis. The triggers refer to conditions that 
indicate the likelihood of arrhythmia. For example, triggers can include sinus beats, premature 
sinus beats, beats following long sinus pauses, long-short beat sequences, R on T-wave beats, 
ectopic ventricular beats, premature ventricular beats, and so forth. Triggers can include threshold 

15 values that indicate arrhythmia, such as threshold values of ST elevations, heart rate, increase or 
decrease in heart rate, late-potentials, abnormal autonomic activity, and so forth. A left bundle- 
branch block diagnosis can be associated with triggers such as the absence of q wave in leads I and 
V6, a QRS duration of more than 120 msec, small notching of R wave, etc. 

Triggers can be based on a patient's history, for example the percentage of abnormal beats 

20 detected during an observation period, the percentage of premature or ectopic beats detected during 
an observation period, heart rate variation during an observation period, and so forth. Triggers may 
also include, for example, the increase or decrease of ST elevation in beat rate, the increase in 
frequency of abnormal or premature beats, and so forth. 

A matching module (not shown) attempts to match the separated signals with one or more 

25 of the stored triggers. If a match is found, the matching module displays the matched 
corresponding diagnosis, or sends a warning to a healthcare worker or to the patient. Methods such 
as computer-implemented logic rules, classification trees, expert system rules, statistical or 
probability analysis, pattern recognition, database queries, artificial intelligence programs and 
others can be used to match the separated signals with stored triggers. 

30 FIGURE 2 is a flowchart illustrating one embodiment of a process for separating EKG 

signals. The process starts from a start block 202, and proceeds to a block 204, where the 
computing module 120 of the EKG system 100 receives the recorded signals Xj from the electrode 
sensors, with J being the number of channels. Prior to processing, the signals can be amplified to 
strengths suitable for computer processing. Analog-to-digital conversion of signals can also be 

35 performed. 

From the block 204, the process proceeds to a block 206, where the initial values for a "un- 
mixing" matrix of scaling weights Wy are selected. In one embodiment, the initial values for a 
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matrix of initial weights W i0 are also selected. The process then proceeds to a block 208, where a 
plurality of training signals Yj are produced by operating the matrix on the recorded signals. In a 
preferred embodiment, the training signals are produced by multiplying the matrix with the 
recorded signals such that Yj = W { j * Xj. In one embodiment, the initial weights W {0 are included 
5 such that Y,- = Wy * Xj + W i0 . The process proceeds from the block 208 to a block 210, wherein the 
scaling weights Wy and optionally the initial weights W i0 are adjusted to reduce the information 
redundancy among the training signals. Methods of adjusting the weights have been described in 
the Appendix. 

The process proceeds to a decision block 212, where the process determines whether the 

10 information redundancy has been reduced to a satisfactory level. The criteria for the determination 
has been described in the Appendix. If the process determines that information redundancy among 
the training signals has been reduced to a satisfactory level, then the process proceeds to a block 
214, where the training signals are displayed as separated signals Yj, with I being the number of 
components for the separated signals. In a preferred embodiment, I, the number of components of 

15 separated signals, is equal to J, the number of channels of recorded signals. Otherwise the process 
returns from the block 212 to the block 208 to again adjust the weights. From the block 214, the 
process proceeds to an end block 216. 

For the un-mixing matrix W with the final weight values, its rows represent the time 
courses of relative strengths/activity levels (and relative polarities) of the respective separated 

20 components. Its weights give the surface topography of each component, and provide evidence for 
the components' physiological origins. For the inverse of matrix W, its columns represent the 
relative projection strengths (and relative polarities) of the respective separated components onto 
the channels of recorded signals. The back projection of the /th independent component onto the 
recorded signal channels is given by the outer product of the /th row of the separated signals matrix 

25 with the ith column of the inverse un-mixing matrix, and is in the original recorded signals. Thus 
cardiac dynamics or activities of interest accounted for by single or by multiple components can be 
obtained by projecting one or more ICA components back onto the recorded signals, X =W _1 * Y, 
where Y is the matrix of separated signals, Y = W * X. 

The separated signals are determined by the ICA method to be statistically independent and 

30 are presumed to be from independent sources. Regardless of whether there is in fact some 
dependence between the separated EKG signals, test results show that the separated signals provide 
a beneficial perspective for physicians to detect and to locate the abnormal heart conditions of a 
patient. 

In a preferred embodiment, time-delay between source signals is ignored. Since the 
35 sampling frequencies of cardiac signals are in the relatively low 200-500 Hz range, the effect of 
time-delay can be neglected. 
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Improved methods of ICA can be used to speed up the signal separation process. In one 
embodiment, a generalized Gaussian mixture model is used to classify the recorded signals into 
mutually exclusive classes. The classification methods have been disclosed in U.S. Patent 
Application No. 09/418,099 titled "Unsupervised adaptation and classification of multiple classes 
5 and sources in blind source separation" and PCT Application No. WOO 127874 titled "Unsupervised 
adaptation and classification of multi-source data using a generalized Gaussian mixture model." In 
another embodiment, the computing module 120 incorporates a priori knowledge of cardiac 
dynamics, for example supposing separated QRS components to be highly kurkotic and (ar)rythmic 
component(s) to be sub-Gaussian. ICA methods with incorporated a priori knowledge have been 

10 disclosed in T-W. Lee, M. Girolami and T.J. Sejnowski, Independent Component Analysis using an 
Extended Infomax Algorithm for Mixed Sub-Gaussian and Super-Gaussian Sources, Neural 
Computation, 1999, Vol.1 1(2): 417-441. 

FIGURE 3 A illustrates a ten-second portion of 12 channels of signals that were gathered as 
part of an EKG recording. The horizontal axis in FIGURE 3A represents time progression of ten 

15 seconds. The vertical axis represents channel numbers 1 to 12. The signals of FIGURE 3 A are, in 
this case, from a patient that provided a mixture of multiple signals, including QRS complex 
signals, pacemaker signals, multiple oscillatory activity signals, and noise. However, because these 
signals were all occurring simultaneously, they cannot be easily separated from one another using 
conventional EKG equipment. 

20 In contrast, FIGURE 3B illustrates output signals separated from the mixture signals of 

FIGURE 3A, according to one embodiment of the present invention. As above, the horizontal axis 
in FIGURE 3B represents time progression of ten seconds and the vertical axis represents the 
separated components 1 to 12. The separated signals in FIGURE 3B are displayed as components 1 
to 12 corresponding to the channels 1 to 12 in FIGURE 3 A, so that a physician can identify a 

25 separated signal as relating to its respective recorded signal's corresponding sensor location on the 
patient body. For example, in a standard 12-lead arrangement, leads II, HI and AvF represent 
signals from the inferior region. Leads VI, V2 represent signals from the septal region. Leads V5, 
V6, 1, and a VL represent signals from the lateral heart. Right and posterior heart regions typically 
require special lead placement for recording. To better identify the location of a heart condition, 

30 more than 12 leads can be used. For example, 20, 30, 40, 50, or even hundreds of sensors can be 
placed on various portions of a patient's torso. Fewer than 12 leads can also be used. The sensors 
are preferably non-invasive sensors located on the patient's body surface, but invasive sensors can 
also be used. With separated signals each corresponding to one of the locations, a physician can 
review the signals and detect abnormalities that correspond to the respective locations. 

35 As shown in FIGURE 3B, the component #1 represents the pacemaker signals and the early 

part of QRS complex signals. The component #2 represents major portions of later parts of the 
QRS complex signals. QRS complex signals represent the depolarization of the left ventricle. The 
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component #10 represents atrial fibrillation (a type of arrhythmia) signals. Therefore atrial 
fibrillation is predicted to be located at the sensor location that corresponds to channel #10. 
Although components #1 and #10 contain similar frequency contents of oscillatory activity between 
heart beats, they capture activities from different spatial locations. 

5 For EKG signals, we discovered that the signals separated using ICA are usually more 

independent from each other and have less information redundancy than signals that have not been 
processed through ICA. Compared to the recorded signals, the separated signals usually better 
represent the signals from the original sources of the patient's heart. In addition to arrhythmia, the 
separated cardiac signals can also be used to help detect other heart conditions. For example, the 

10 separated signals especially the separated QRS complex signals can be used detect premature 
ventricular contraction. The separated signals especially the separated Q wave signals can be used 
to detect myocardial infarction. Separating the EKG signals, especially separating the QRS complex 
and T wave signals, can help distinguish left and right bundle branch block. 

Of course, the disclosed system and method are not limited to detecting arrhythmia, or any 

15 particular type of disease state. Embodiments of the invention include all methods of analyzing 
medical signals using ICA. For example, when a pregnant woman undergoes EKG recording, the 
heart signals from the woman and from the fetus(es) can be separated. 

The separated cardiac signals can be characterized as non-random but not easily 
deterministic, which make them suitable subjects for chaotic analysis. As mentioned above, chaos 

20 theory (also called nonlinear dynamics) studies patterns that are not completely random but cannot 
be determined by simple formulas. The separated signals can be plotted to produce a chaos phase 
space portrait. By reviewing the patterns in the phase space portrait, including the existence and 
location of one or more attractors, a user is able to assess the likelihood of abnormality in the 
signals, which indicate disease conditions in the patient. 

25 In a preferred embodiment, the QRS complex signals are separated into three different 

components, with each component representing a portion of the QRS complex. The 3 components 
are 3 data sets that are found to be temporally statistically independent using independent 
component analysis. Using the three components, a 3-dimensional phase space portrait of QRS 
complex can be displayed to show the trajectory of the three components. 

30 FIGURE 3C is a sample chart of the component #10 of separated signals (as shown in 

FIGURE 3B) back projected onto the recorded signals of FIGURE 3 A. The separated signals of 
component #10, which indicate arrhythmia, is identified by reference number 302 in FIUGRE 3C. 
The 12 channels of recorded signals are identified by reference number 304 for ease of 
identification. FIGURE 3C therefore allows direct visual comparison of a separated component 

35 against channels of recorded signals. The back projections of cardiac dynamics allow us to exam 
the amount of information accounted for by single or by multiple components in the recorded 
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signals and to confirm the components' physiological meanings suggested by the surface 
topography (the aforementioned inverse of columns of the un-mixing matrix). 

FIGURE 4A illustrates the phase space portrait of the EKG recording of a healthy subject. 
FIGURE 4B illustrates the phase space portrait of the EKG recording of an atrial fibrillation patient. 
5 In FIGURES 4A and 4B, the x, y, and z axis represent the amplitudes of the 3 QRS components. 
The separated signals' values over time are plotted to produce the phase space portraits. In the 
healthy EKG recording of FIGURE 4 A, the dense cluster 402 indicates the existence of an attractor 
that attracts the signal values to the region of the dense cluster 402. The dense cluster 402 
represents the most frequent occurrences of the signals. In the atrial fibrillation patient EKG 

10 recording of FIGURE 4B, an additional loop 404, which is not part of the dense cluster 402, is 
below the attractor and the dense cluster 402 and closer to the base plane than the dense cluster 402. 
This additional loop 404 is presumably due to the oscillatory activity in the baseline portions of the 
EKG signals. The separated component #10 signal that indicate an arrhythmia condition is 
presumably responsible for the additional loop 404. The visual pattern can be compared with the 

15 visual pattern of a health subject and manually recognized as probative of indicating an abnormal 
condition such as atrial fibrillation. 

Instead of the 3 QRS complex components as shown in FIGURE 4B, other components or 
more than 3 components can also be used to plot the chaos phase space portrait. If more than 3 
components are used, the different components can be plotted in different colors. The 3 QRS 

20 complex components of FIGURE 4B are selected because test results suggest that such a phase 
space portrait is physiological significant and functions usually well as an indication of a patient's 
heart condition. 

Although FIGURES 3A, 3B, 4A and 4B were produced using test results related to the 
detection and localization of focal atrial fibrillation, the disclosed systems and methods can be used 

25 to detect and to localize other heart conditions including focal and re-entrant arrhythmia. The 
disclosed systems and methods can also be used to detect and to localize paroxysmal atrial 
fibrillation as well as persistent and chronic atrial fibrillation. 

The disclosed methods can be used to improve existing cardioverter/defibrillators (ICD's) 
that can deliver electrical stimuli to the heart. In addition to existing ICD's and existing 

30 pacemakers, some of the existing cardiac rhythm management devices also combine the functions 
of pacemakers and ICD's. A computing module embodying the disclosed methods can be added to 
the existing systems to separate the recorded cardiac signals. The separated signals are then used 
by the cardiac rhythm management systems to detect or to predict abnormal conditions. Upon 
detection or prediction, the cardiac rhythm management system automatically treats the patient, for 

35 example by delivering pharmacologic agents, pacing the heart in a particular mode, delivering 
cardioversion/defibrillation shocks to the heart, or neural stimulation of the sympathetic or 
parasympathetic branches of the autonomic nervous system. Instead of or in addition to automatic 
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treatment, the system can also issue a warning to a physician, a nurse or the patient. The warning 
can be issued in the form of an audio signal, a radio signal, and so forth. The disclosed signal 
separation methods can be used in cardiac rhythm management systems in hospitals, in patient's 
homes or nursing homes, or in ambulances. The cardiac rhythm management systems include 
5 implantable cardioverter defibrillators, pacemakers, biventricular or other multi-site coordination 
devices and other systems for diagnostic EKG processing and analysis. The cardiac rhythm 
management systems also include automatic external defibrillators and other external monitors, 
programmers and recorders. 

In one embodiment, an improved cardiac rhythm management system includes a storage 

10 module that stores the separated signals. In one arrangement, the storage module can be removed 
from the cardiac rhythm management system and connected to a computing device. In another 
arrangement, the storage module is directly connected to a computing device without being 
removed from the cardiac rhythm management system. The computing device can provide further 
analysis of the separated signals, for example displaying a chaos phase space portrait using some of 

15 the separated signals. The computing device can also store the separated signals to provide a 
history of the patient's cardiac signals. 

The disclosed methods can also be applied to predict the occurrence of arrhythmia within a 
patient's heart. After separating recorded EKG signals into separated signals, the separated signals 
can be matched with stored triggers and diagnosis as described above. If the separated signals 

20 match stored triggers that are associated with arrhythmia, an occurrence of arrhythmia is predicted. 
In other embodiments, an arrhythmia probability is then calculated, for example based on how 
closely the separated signals match the stored triggers, based on records of how frequently in the 
past has the patient's separated signals matched the stored triggers, and/or based on how frequently 
in the past the patient has actually suffered arrhythmia. The calculated probability can then be used 

25 to predict when will the next arrhythmia occur for the patient. Based on statistics and clinical data, 
calculated probabilities can be associated with specified time periods within an arrhythmia will 
occur. 

In addition to EKG signals, the disclosed systems and methods can be applied to separate 
other electrical signals such as electroencephalogram signals, electromyographic signals, 

30 electrodermographic signals, and electroneurographic signals. They can be applied to separate 
other types of signals, such as sonic signals, optic signals, pressure signals, magnetic signals and 
chemical signals. The disclosed systems and methods can be applied to separate signals from 
internal sources, for example within a cardiac chamber, within a blood vessel, and so forth. The 
disclosed systems and methods can be applied to separate signals from external sources such as the 

35 skin surface or away from the body. They can also be applied to record and to separate signals 
from animal subjects. 
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Although the foregoing has described certain preferred embodiments, other embodiments 
will be apparent to those of ordinary skill in the art from the disclosure herein. Additionally, other 
combinations, omissions, substitutions and modifications will be apparent to the skilled artisan in 
view of the disclosure herein. Accordingly, the present invention is not to be limited by the 
5 preferred embodiments, but is to be defined by reference to the following claims. 

The present application incorporates by reference U.S. Patent No. 5,706,402, titled "Blind 
signal processing system employing information maximization to recover unknown signals through 
unsupervised minimization of output redundancy" filed November 28, 1994 in its entirety as an 
APPENDIX as follows. 
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United States Patent No. 5,706,402 
Inventor: Anthony J. Bell 

Blind signal processing system employing information maximization to recover 
unknown signals through unsupervised minimization of output redundancy 
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ABSTRACT 



A neural network system and unsupervised learning process 
for separating unknown source signals from their received 
mixtures by solving the Independent Components Analysis 
(ICA) problem. The unsupervised learning procedure solves 
the general blind signal processing problem by maximizing 
joint output entropy through gradient ascent to minimize 
mutual information in the outputs. The neural network 
system can separate a multiplicity of unknown source sig- 
nals from measured mixture signals where the mixture 
characteristics and the original source signals are both 
unknown. The system can be easily adapted to solve the 
related blind deconvolution problem that extracts an 
unknown source signal from the output of an unknown 
reverberating channel. 

15 Claims, 8 Drawing Sheets 
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BLIND SIGNAL PROCESSING SYSTEM volution or ^lind «,u^^" ««« ^ a of 

EMPLOYING INFORMATION important areas such aa data transmission, acoustic rever- 

"SSSSSBSSS55S5B" ^TttSSSXEZ-tt 

REFEREKCE TO GOVERNMENT RIGHTS training mode mat transmits a known training sequence to 

establish deconvolution parameters or in a blind mode. 

The U. S. Government has rights in theiavention dis- ^ ^ rf oommuaic&UoQ systcm s that may need blind 

closed and claimed herein pursuant to Office of Naval [Q cqualizadon capaD nity indudes high-capacity line-of-sitc 

Research grant no. N00014-934-0631. digital radio (cellular telecommunications). Such a channel 

MravrrTrkM suffers from anomalous propagation conditions arising from 

BACKGROUND OF THE INVENTION S Conditions, whichew degrade digital radio perfor- 

1 Reld of the Invention mance by causing the transmitted signal to propagate along 
TOs invention relates generally to systems for recovering 13 several paths of different electrical length (multipath 

the ori^ unknown sl^a^bjccted to transfer through fading). Severe multipath fading requfres a blind equaliza- 

an untaown multichannel system by processing the known tion scheme to recover channel operation, 

output signals therefrom and relates specifically to an In reflection seismology, a reflection coefficient sequence 

infonnation-maxiniixing neural network mat uses unsuper- ^ be blindly extracted from the received signal, which 
vised ieanimg to recover each of a multipUdty of unknown » indudes echoes produced at the different reflection points of 

source signals in a multichannel having reverberation. the unknown geophysical modeL The traditional linear- 

2 Description of the Related Art predictive seismic dcconvolution method used to remove the 

^.y«aipuouwu»^ . o source waveform from a seismogram Ignores valuable phase 

Blind Signal *™^J^g£ in the Section scismogram. This 

applications, the sample signals prwUted by ■« „ . overcome by using blind deconvolution to 

fi™tebowa mixtures. Generally^ signal sources as statistical geological reflection coefficient model. 
w^rAe^r^ti^characterlstics are unknown. Without Blind deconvolution can also be used to recover unknown 
knowledge of the signal sources other than the general x images that are blurred by transmission through unknown 
statistical assumption of source Independence, this signal systems. 

processing problem is known in the art as the "Mind source Blind Separation Methods: Because of the fundamental 
separation problem". The separation is "blind" because injportance of both the blind separation and blind deconvo- 
nothiog is known about the statistics of the independent | ut i 0D signal processing problems, practitioners have pro- 
source signals and nothing is known about the mixing M posed several classes of methods for solving the problems. 
process. The blind separation problem was first addres sed in 1986 by 

The blind separation problem is encountered in many Jutten and Herault ("Blind separation of sources. Part I: An 
familiar forms. For instance, the well-known "cocktail adapdve algorithm based on neuromimeuc architecture 
party" problem refers to a situation where the unknown Signal processing 24 (1991) 1-10). who disclose the HJ 
rsourcersUEMla are sounds generated in a room and the «, neural network with backward connections mat can usually 
known (sensor) signals are the outputs of several micro- solve the simple two-dement blind source separation prob- 
phones. Each of the source signals is delayed and attenuated km. Disadvantageous^, the HJ network iterations may not 
to some (time varying) manner during transmission from converge to a proper solution in some cases, depending on 
source to microphone, where a is then mixed with other the Initial state and on the source statistics. When coover- 
Independemlyddayed and attenuated source signals, indud- «j pace is possible the HJ network appears to converge to two 
uTmultipalh v«tons of Itself (reverberation), which are stages, the first of which quickly deccnetotes the two output 
detoyed versions arriving from different directions. signals and the second of which more dowly provides toe 

This signal proccsdng problem arises in many context, statistical Independence necessary •» <"» 
other *2 toe d^e^ation where each of two unknown unknown sourc^Comon et al ( Wtad s^atton of 

^ers^ud^ o^o or^lar signals ->^rTl!£!^?*™ m ZZZ22£ 

Sdto tEmptf anUmnas. me separation of odors In a the output signals, thereby adtievmg some degree of tons. 

^tmVbya3r array, me parsingrf the environment tied todependence by mlnlmtang higher-order statistics 

Into separate objects by our biological visud system, and me u among the known sensor signals, 

separation of Womagnedc sources by a superconducting Other practitioners have attempted to improve the HJ 

quantum interference device (SQUID) array in magnetoen- network to remove some of the disadvantageous features, 

oephdograpby. Other important examples of the blind For instance, Sorouchyari ("Blind separation of sources. Part 

source separation problem include sonar array signal pro- HI: Stability andytis" Signal Processing 24 (1991 ) 21-29) 

cessina and stood decoding in cellular tdecommunication M examines other higher-order non-linear transforming tunc 

systa ^ * tions other than those simple first and third order functions 

The blind source separation problem is closely related to proposed by Jutten et d. but concludes that the higher-order 

the more familiar "blind deconvolution" problem, where a functions cannot improve implementation of die HJ net- 

£*£S2 TsoSS signd is extrac.5 from a known work, to ^^ 0 ^^^'°J^" 

mixed stand mat todude7many time-delayed versions of « application Ser. No. 08/074.940 and fully incorporated 

r s l^g^ting from unknown multipath distortion or hereto by this reference U et ^describe . bltod source 

S» oSf^nvolution). The need for blind decon- separation system based on the HJ neural network model 
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that employs linear beamf arming to improve HJ network cal Access to the system input signal This unknown system 
separation performance. Also, John C. Piatt et al. may be a nonminimum phase system having one or more 
("Networks For Hie Separation of Sources That Are Super- zeroes outside the unit circle in the frequency domain. The 
imposed and Delayed", Advances In Neural Information blind decoovolutioo process must identify both the magni- 
Processin g Systems, vol. 4. Morgan-Kauftnann. San Mateo, 5 tude and the phase of the system transfer function. Although 
1992) propose extending the original rnagnitu de-optimizing identification of the magnitude component requires only the 
HJ network to estimate a matrix of time delays in addition second-order statistics of the system output signal identifl- 
to the HJ magnitude mixing matrix. Piatt et al. observe that cation of the phase component is more difficult because it 
their modified network is disadvantaged by multiple stable requires the higher-order statistics of the output s i g n al, 
states and unpredictable convergence, l0 Accordingly, some form of non-linearity is needed to extract 
Pierre Comon ("Independent component analysis, a new the higher-order statistical information contained in the 
concept?" Signal Processing 36 (1994) 287-314) provides a magnitude and phase components of the output signal Such 
detailed discussion of Independent Component Analysis non-linearity Is useful only for unknown source signals 
(ICA) which defines a class of closed form techniques having non-Gaussian statistics. There is no solution to the 
uset^fcTsolvmgmeblmdid^nu^cationanddeconvdution 15 problem when the input source signal is Gaussian- 
problems. As is known In the art, ICA searches for a distributed and the channel Is nonminimum-phase because 
transformation matrix to minimize the statistical dependence all polyspectra of Gaussian processes of order greater than 
among components of a random vector. This is distinguished two are identical to zero. 

from Principal Components Analysis (PCA), which searches Classical adaptive deconvolution methods are based 
for a transformation matrix to minimize statistical ccrrela- M almost entirely 00 second order statistics, and thus fail to 
tion among components of a random vector, a solution that operate correctly for nonminimum-phase channels unless 
is inadequate for the blind separation problem. Thus, PCA the input source signal is accessible. This failure stems from 
can be applied to minimize second order cross-moments the inability of second-order statistics to distinguish 
among a vector of sensor signals while ICA can be applied minimum-phase information from mail mu m -phase infer- 
to minimize sensor signal joint probabilities, which offers a u raation of the channel A niinimum phase system (having all 
solution to the blind separation problem Comon suggests zeroes within the unit circle in the frequency domain) 
lhat although mutual information is an excellent measure of exhibits a unique relationship between its amplitude 
the contrast between joint probabilities, it is not practical response and phase response so that second order statistics 
because of computational complexity. Instead, Comon in the output signal are sufficient to recover both amplitude 
teaches the use of the fourm-orc^ cumulant tensor (thereby 30 and phase information for the input signal. In a 
ignoring fifth-order and higher statistics) as a preferred nonminimum-phase system, second-order statistics of the 
measure of contrast because the associated computational output signal alone are insufficient to recover phase infer- 
complexity increases only as the fifth power of the number mation and. because the system does not exhibit a unique 
of unknown signals. relationship between its amplitude response and phase 
Similarly, Giiles Burel ("Blind separation of sources: A 35 response.. Mind recovery of source signal phase information 
nonlinear neural algorithm". Neural Networks 5 (1992) is not possible without exploiting mgherKao^ output signal 
937_947) asserts that the blind source separation problem is statistics. These require some form of non-linear processing 
nothing more than the Independent Components Analysis because linear processing is restricted to the extraction of 
(ICA) problem. However, Burel proposes an iterative second-order statistics. 

scheme for ICA employing a back propagation neural net- 40 Bussgang techniques for blind deconvolution can be 
work for blind source separation mat handles non-linear viewed as iterative poly spectral techniques, where rationale 
mixtures through iterative minimization of a cost function. arc developed for choosing the poly spectral orders with 
Buret's network differs from the HJ network, which does not which to work and their relative weights by subtracting a 
minimize any cost function. Like the HJ network, Burel* s source signal estimate from the sensor signal output. The 
system can separate the source signals in the presence of 4S Bussgang techniques can be understood with reference to 
noise without attempting noise reduction (no noise hypoth- Sandro Bellini (chapter 2: Bussgang Techniques For Blind 
eses are assumed). Also, like the HJ system, practical Deconvolution and Equalization**, Blind Deconvolution, S. 
convergence Is not guaranteed because of the presence of Haykin (ed), Prentice Hall, Englewood Cliffs. NJ., 1994), 
local minima and computational complexity, BurcTs system who characterizes the Bussgang process as a class of pro- 
differs sharply from traditional supervised back-propagation 50 cesses having an auto-correlation function equal to the 
applications because his cost function is not defined in terms cross-correlation of the process with itself as it exits from a 
of difference between measured and desired outputs (the zero-memory non-linearity. 

desired outputs are unknown). His cost function is instead Polyspectral techniques for blind deconvolution lead to 

based on output signal statistics atone, which permits "unsu- unbiased estimates of the channel phase without any inf or- 

pervisecT learning in his network. 55 mation about the probability distribution of the input source 

Blind Deconvolution Methods: The blind deconvolution signals. The general class of polyspectral solutions to the 

art can be appredaied with reference to the text edited by blind decarrclatlon problem can be understood with refer- 

Simon Haykin (Blind Deconvolution, Prentice-Halt New ence to a second Simon Haykin textbook C*Ql 20: Blind 

Jersey, 1994). which discusses four general classes of blind Deconvolution**. Adaptive Filter Theory, Second Ed. , Simon 

deconvolution techniques, including Bussgang processes, 60 Haykin (ed.). Prentice Hall, Englewood Cliffs, NJ., 1991) 

higher-order cumulant equalization, polyspectra and maxi- and to Hatztnakos et al ("Ch. 5: Blind Equalization Based 

mum likelihood sequence estimation. Haykin neither con- on Higher Order Statistics (HOS)". Blind Deconvolution, 

siders nor suggests specific neural network techniques suit- Simon Haykin (ed.). Prentice Hall, Englewood Cliffs, NJ., 

able for application to the blind deconvolution problem. 1994). 

Blind deconvolution is an example of ^supervised" 63 Thus, the approaches in the art to the blind separation and 
learning in the sense that it learns to identify the inverse of deconvolution problems can be classified as those using 

an unknown linear time-invariant system without any physl- non-linear transforming functions to spin off higher-order 
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statistics (lutteo et al. and Bellini) and those tiling explicit examines this issue and shows that *^h^ enffopy 

aUaLion of higher-order cumnlants and polyspectra coding" in a biological sensory system operates to reduce 

(Havldn and Hatzlnakos et aL). The HI netwexk does not the troublesome mutual information component ewn ai me 

reliably converge even for the simplest two-source problem expense of suboptimal symbol frequency owtribubo^ B*- 

andthe fourth-order cumulant tenser approach does not s low shows that the mutual information componentof redun- 

reliably converge because of truncation of the cumnlant dnncy can be minimized in a neural network by feeding each 

mansion. Them is accordingly a dearly-felt need for blind neuron output back to other neuron inputs ^oughaMi. 

stoal processing methods (hat can reliably solve the blind Hebbian synapsea to discourage Related ou^ut activity, 

mrassins problem for significant numbers of source sig- This "redundancy reduction- principle is offered to explain 

10 how unsupervised perceptual learning occurs in animals. 

Unsupervised Learning Methods: In the biological sen- S. Uughlio ("A Simple Coding Procedure Enhances a 

sorv system arts, practitioners have formulated neural train- Neuron's Information Cupadty". Z. Nasurforsch 36(1981) 

ta« cXuulty criteria based on studies of biological sensory 910-912) proves that the optical neuron of a blowfly opti- 

M^mflTwWch are known to solve blind separation and mixes information capacity through equalization of the 

Evolution problems of many kinds. The dass of super- „ probability distribution for each neural code value 

vised learuinn techniques normally used with artificial neu- (minimizing the unused channel capacity component ot 

rs3«W useful for these problems because redundancy), thereby confirming : Bartow's -minimum 

leSTea^g require, access to <he%ourc* signals for redundancy- principle. I. J. H^^ 0 *^™^ 

nEg purposes . Unsupervised learning instead requires tion and object perception-. Proc Nasi Acad. Set. USAJS 

toten^eating the necessary teaching „ (August 1991) 6462^ ^arntoes *e 

signals without access to the source signals. source solution in neurons using the HI neuron model for 

Practitioner, have proposed several rationale fcr uniuper- minimizing output redundancy . _ 

WsXnung in WofogTcnl senscy systems. For instance, Becker et aL C'Self^anirir« neur^ork d«t dis- 

Linsker ("An Application of the Principle of Maximum covers surfaces in random-dot stereograms . Nature vol 

to Line»^teL-. dances in a 355. pp. 161-163 Jan. 9J992) P^cpose a stented b.ck- 

Neural Information Processing Systems 1. D. S. Tcuretzky propagation neural network learning model modified to 

rcrfTMorW-Knufmann^^^ replace the external teacher supervised learning by 

^ornax^dple (first proposed in 1987) explains why Intcrnally-derived teaching signals (unmpervued le^nmg) 

Motorics! sWsystems operate to minimize information Becker et aL use non-tinear networks to maximUe mutual 

toKenneial layer. inthe presence of noise. In a lata „ Information between different seta of outputs contrary to the 

work ("Local Synaptic Learning Rules Suffice to Maximize blind signal recovery requirement. By increasing 

Mutual formation in a Linear Network". Neural Compu, redundancy, their network discovers mvananco in separate 

tatton 4 (1992) 691-702) Linsker describes a two-phase groups of inputs, which can be selected «it of information 

learning algorithm for maximizing the mutual information passed forward to Improve processing efficiency, 

between two layers of a neural network. However, linsker 35 Thus, it is known in the neural network aits that anU- 

assumes a linear input-output transforming function and Hebbian mutual Interaction can be used to explain the 

multivariate Gaussian statistics for both source signals and decorrelation or m ini miz a t ion of redundancy observed in 

noise components. With these assumptions, Linsker shows biological visloo systems. This can be apprecktedwith 

that a local synaptic" (biological) learning rule is sufficient reference to H. B. Barlow et al. ("Adaptation and Decorre- 
to maximize mutual information but he neither considers nor «, lation in the Cortex", The Computing Neuron R. Durbtn et 

suggests solutions to the more general blind processing aL (eds.). Addison-Wesley. (1989) and to Schmudolph et aL 

Problem of recovering non-Oaussian source signals in a ("Competitive Antl-Hebblan Learning of Invanance . 

non-tinear tranrforming environment Advances in Neural information Proceumg Systems A. 1 E. 

Simon HayainrCh. 11: Self-Organiring Systems ID: Moody et al (eds.). Mo^an-lUufmjnn (19MXU fart. 
M^fionTheoretie Models", Neural Networks: A Com- ts practitioners have suggested that Linsker s Womax pnn- 

Foundation Ts. Hay kin (ed.) MaeMfllan. New ciple and Barlow's "minimum redtmdancy- prfnapte may 

^994) d^eVZLufcer-i SWxJc principle, which both yield the same neural network learning Procedures. 

UtedJSXaf me neural network le-ningrole used in its Until now. however, non-linear version > <^""»» 

knplZentatiot, Hay kin also discus.es other well-known applicable to the blind signal processing problem have been 

principles such as the "minimization of information loss" » unknown in the art t . . 

crindple suggested in 1988 by Plumbley et si. and Barlow's The Blind Processing Problem As mentioned above. 

Vindple ofminimum redundancy-, first proposed in 1961. blind source separation and blind deconvoluuon are .elated 

dthaTf which can be used to derive a class of unsupervised problems in signal processing. The bund source separation 

tear^rdet problem can be succinctly stated as where a set of unknown 
Zeph Atick (-Could information theory provide an eco- » .ource signals S/t) . ..S/t) «_ mixed "S^r^eariy 

logtadtoe«y 0/ sensory procesringr, Network 3 (1992) by an unknown matrix [A,]. Nothing is knownabout the 

MMS iTawUes Shannon" infermation theory to the source, or (he mixing process, both of which may be 

neural vLxUes seen In biological optical sensors. Atick time-varying, although the mixing process us assumed to 

*^Sm?orrnation redurJar^ UuTful only in noise vary slowly with re.pect to me source. The bhnd s*p«ralion 
nnd indudes two components: (a) unused channel capadty so task is to recover die onginal so^ signals from jhe . W 

Arising from subt»tlmal symbol frequency distribution and measured superposition, of them, X/t). X/t)by finding 

SZS^SSZS^ munial information. Atick a square matrix rw J that is a pamutation of tite mvase of 

sWaTduTopdcal neuron. apparently evolved to mini- the unknown matrix (A^LThe blind devolution problem 

miK the troublesome totersymbol redundancy (mutual can be similarly stated as where a stogie unkr^wo signal SW 
Information) component of redundancy rather than to mini- « is convolved with an unknown tapped dday-line filter 

STo^^oancy. H. B. Barlow ("Unsupervised A„ . . . . A„ producing the corrupted measured s«ual 

l^^P Neurol Compulation 1 (1989) 295-311) also X(t)=A(t) • S(t). where Aft) is the impulse response of the 
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unknown (perhaps slowly time-varying) film. The Wind 
deconvolution task is to recover S(t) by finding and con- 
volving X(t) with a tapped delay-line filter W^,.,W ; 
having the impulse response W(t) that reverses the effect of 
the unknown filter A(t). 5 

There are many similarities between the two problems. In 
one. source signals are corrupted by the superposition of 
other source signals and. in the other, a single source signal 
is corrupted by superposition of time-delayed versions of 
itself. In both cases, unsupervised learning is required 10 
because do error signals are available and no training signals 
are provided. In both cases, second-order statistics alone arc 
inadequate to solve the more general problem. For instance, 
a second-order decoxrelation technique such as that pro- 
posed by Barlow et al. would find unoorrelated (linearly l5 
independent) projections [Y,] of the input sensor signals [X,] 
when attempting to separate unknown source signals {S,} 
but is limited to discovering a symmetric decorreladon 
matrix that cannot reverse the effects of mixing matrix I/VJ 
If the mixing matrix is asymmetric. Similarly, second-order M 
decorreUtion techniques based on the autocorrelation 
function, such as prediction-error fillers, are phase-blind and 
do not offer sufficient information to estimate the phase 
characteristics of the corrupting filter A(t) when applied to 
the more general blind deconvolution problem. u 

Thus, both blind signal processing problems require the 
use of higher-order statistics as well as certain assumptions 
regarding source signal statistics. For the blind separation 
problem, the sources are assumed to be statistically inde- 
pendent and non-Gaussian. With this assumption, the prob- 30 
lent of learning [WJ becomes the ICA problem described by 
Comon. For blind deconvolution. the original signal S(t) is 
assumed to be a "white" process consisting of Independent 
symbols. The blind deconvolution problem then becomes 
the problem of removing from the measured signal X(t) any 35 
statistical dependencies across time that are introduced by 
the corrupting filter A(t). This process is sometimes denomi- 
nated the "whitening- of X(t). 

As used herein, both the ICA procedure and the 4 'whit- 
ening** of a time series are denominated "redundancy reduc- 40 
don". The first class of techniques uses some type of explicit 
estimation of cumulants and polyspectra, which can be 
appreciated with reference to Haykin and Hatzinatos et al. 
Disadvantageouzly, such "brute force" techniques are com- 
putationally intensive for high numbers of sources or taps 43 
and may be inaccurate when cumulants higher than fourth 
order are ignored, as (hey usually must be. The second class 
of techniques uses static non-linear functions, the Taylor 
series expansions of which yield higher-order terms. Itera- 
tive learning rules containing such terms are expected to be so 
somehow sensitive to the particular higher-order statistics 
necessary to accurate redundancy reduction. This reasoning 
is used by Comon et al. to explain the HI network and by 
Bellini lo explain the Bussgang deconvolves 
Dis advantageously, there is no assurance that the particular 55 
higher-order statistics yielded by the (heuristically) selected 
non-linear function are weighted in me manner necessary for 
achieving statistical independence. Recall that the known 
approach to attempting improvement of the HJ network is to 
test various non-linear functions selected heuristically and 60 
that the original functions are not yet unproved in the art 

Accordingly, there is a need in the art for an improved 
blind processing method, such as some method of rigorously 
Unking a static non-linearity to a learning rule that performs 
gradient ascent in some parameter guaranteed to be usefully 65 
related to statistical dependency. Until now. this was 
believed to be practically impossible because of the infinite 



8 

number of higher-order statistics associated with statistical 
dependency. The related unresolved problems and deficien- 
cies are clearly felt in the an and are solved by this invcntioo 
in the manner described below. 

SUMMARY OF THE INVENTION 

This invention solves the above problem by introducing a 
new class of unsupervised learning procedures for a neural 
network mat solve the general blind signal processing prob- 
lem by maximizing joint input/output entropy through gra- 
dient ascent to m*nii«ii» mutual information in the outputs. 
The network of this invention arises from the unexpectedly 
advantageous observation that a particular type of non-linear 
signal transform creates learning signals with the higher- 
order statistics needed to separate unknown source signals 
by minimizin g mutual information among neural network 
output signals. This invention also arises from the second 
unexpectedly advantageous discovery that mutual informa- 
tion among neural network outputs can be ininimized by 
maximizing joint output entropy when the learning trans- 
form Is selected to match the signal probability distributions 
of interest 

The process of this invention can be appreciated as a 
generalization of the infomax principle to non-linear units 
with arbitrarily distributed inputs uncorrupted by any known 
noise sources. It is a feature of the system of this invention 
that each measured input signal is passed through a prede- 
termined sigmoid function to adaptrvely maximize informa- 
tion transfer by optimal alignment of the monotonic sigmoid 
slope with the input signal peak probability density. It is an 
advantage of mis invention that redundancy is minimiz ed 
among a multiplicity of outputs merely by maximizing total 
information throughput thereby producing (he independent 
components needed to solve the blind separation problem. 

The foregoing, together with other objects, features and 
advantages of mis invention, can be better appreciated with 
reference to the following specification, claims and the 
accompanying drawing. 

BRIEF DESCRIPTION OF THE DRAWING 

For a more complete understanding of this invention, 
reference is now made to the following detailed description 
of the embodiments as Illustrated in the accompanying 
drawing, wherein: 

FIGS. 1A, IB, 1C and ID illustrate the feature of sig- 
moids! transfer function alignment fox optimal Information 
flow in a sigmoids! neuron from the prior art; 

FIGS. 2A, 2B and 2C illustrate the blind source separation 
and blind deconvolution problems from the prior art; 

FIGS. 3A. 3B and 3C provide graphical diagrams illus- 
trating a joint entropy maximization example where maxi- 
mizing joint entropy fails to produce statistically indepen- 
dent output signals because of improper selection of the 
non-linear transforming function: 

FIG- 4 shows the theoretical relationship between the 
several entropies and mtrtml information from the prior art; 

FIG. 5 shows a functional block diagram of an illustrative 
embodiment of the source separation network of this inven- 
tion: 

FIG. 6 is a functional block diagram of an Ulustrative 
.mhMtmMit of the blind decorr elating network of this 
invention; 

FIG. 7 Is a functional block diagram of an illustrative 
embodiment of the combined blind source separation and 
blind decorrelation network of this invention; 
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FIGS. SA, 8B and 8C show typical probability density Rcfciring to FIG. 1A. when a single input x is passed 

whetions for speech, rcckmusic and Gaussian white noise; through a transforming function g(x) to tfve an output 

FIGS. 9A^and9B show typical spectra of a speech signal 

before and after, deeoxre^n is performed according to the $ 

procedure of this invention. no-linear transforming function g(x). This is equivalent to 

FIG. 10 shows the results of a Wind source separation ^ ^ nl of a ncui0n input-output function to the 

experiment performed using the procedure of this Uvention; distribution 0 f incoming signals that leads to 

and optimal information flow in sigmoidal neuroos shown in 

FIGS. 11A, 11B, 11C. 11D, 11E. 11F, 11G, 11H. tit 111, l0 piGS. 1C-1D. FIG. ID shows a wxo-mode distribution 

UK and 11L show time domain filter charts illustrating the matched to the sigmoid function in FIG. 1C. In FIG. 1 A, the 

results of the blind decoovolution of several different cor- x having a probability distribution f«(x) ii passed 

ntpted human speech signals according to the procedure of through the non-linear sigmoidal function g(x) to produce 

this invention. output signal y having a probability distribution Uy). The 

, ^ is information in the probability density function ijy) varies 

DETAILED DESCTffTON OT THE ^scWe to the aSg nrnent of the mean and variance of x 

PREFERRED EMBODIMENTS Aspect to tr*to*hold w„ and slope w of g(x). When 

This invention arises from the unexpectedly advantageous g ( x ) i s moootom'calJy inaeasing or decreasing (thereby 

observation that a class of unsupervised learning rules for having a unique inverse), the output signal probability 

maximizing information transfer in a neural network solves ^ density function Uy) can be written as a function of the 

the blind signal processing problem by minimizin g redun- input signal probabUity density function fjx) as follows: 
dancy in the network outputs. This class of new learning 

rules is now described in information theoretic terms, first Afr) IEqa31 



for a single input and then for a multiplicity of unknown 
input signals 



Information Maximization For a Single Source 

In a swrte-input network, the mutual ixiformatlon that the where H denotes absolute value, 

output y of Hetwork contains about its input x can be Eon. 3 leads to the unexpected o^covery of anadvanta- 

««ressed as- geous gradient descent rjocess because the output signal 

express . entropy can be expressed in terms of the output signal 

l(**>#tyyH{y\*) 11 probability density function as follows: 

where H(y) is the entropy of the output signal, H(ytx) is that 4J 

pordan of the output signal entropy that did not come from r +- 

the input signal and I(y»x) is the mutual information. Eqn. 1 //00«-filto#y)]=-J ^J&MfWk 

can be appreciated with reference to FIO. 4, which illustrates js * ~~ 

the well-known relationship between input signal entropy where El ) denotes expected value. Substituting Eqn. 3 into 

H(x), output signal entropy H(y) and mutual information £qn. 4 produces the following: 

l(y»x)- 

When there is no noise or when the noise is treated as r i * n ^ si 

merely another unknown input signal, the mapping between 40 ^ L l**U 

input x and output y is deterministic and conmtional enfrc^y fa ^ mc 

H(ybt) has its lowest possible value, diverging to . m»u ™^ S gM | enfropy H(xX which cannot be 

infinity. Tbb d^aee is a ^sequence of th^eraU- ^ changes in the parameter w that defines 

ration of information theory < to continuous random vaj £^J^ on g(x). T^cref c^^c first term on the 

ables. Hie output entropy H(y) is really the differential 45 ° of ^ ^ maximi2J ^ to nuximize the 

entropy of output signal y with respect to some rrfe*eo£ stall SwW This first term is the avaage 

such as Che noise level « the granuhurty of the olscrctc J^JfJ 1 ^ J™ jf^a eignal x on output signal y 
representation of *e varbbles in x andy. the InpTsigSds as 

«,mpkxitUs ^^ wo ^ f ^^^f^^ u ^ ^ & -training «r with density «x) and deriving an online, 

consideratioo of the gradient of U*^*"**^ 50 stochastic gradient descent learning rule expressed as: 

titles with respect to some parameter w. Such gradients are j«wimo^»«^« -i~ 

as well-behaved as arc discrete-variable entropies because H ^ 6J 

the reference terms involved in the defiairion of differential w d / 1 a, |W__ft_\ 3 f 

entropies disappear. In particular, Eqn. 1 can be different- Aw ™ 0 ~3w~ V I I / l^/ ^ V *W 

atedtoobCammeccras^ 55 ^ 0 a 5caliDg ^ur, Aw for changing the 

a (Bp. 21 parameter w to adjust the log of the slope of sigmoid 

"GO function. Any sigmoid function can be used to specify 

, , „, . _ . . nmwu measure Aw, such as the widely-used logistic fransfer rune- 

because, in the noiseless case, H(y*x) docs not depend on w J" 

and its differential disappears. Thus, for continuous deter- « 

mini stlc matdungs, the mutual information between net- ^w-)-*, vhm 7] 
work input and network output can be rnaximized by maxi- 

mizing me gradient of the entropy of the output alone, which in which the input x is first aligned with the sigrnoid function 

is an unexpectedly advantageous consequence of treating through nniltiplication by a scnUng weight w nnd addition of 

noise as^c^unlcn^wn source signal This p«mits the 65 a bias weight w 0 to create an aligned signal u which itt then 

discussion to continue without knowledge of the input signal non-Unearly transformed by the 

rtamtics CTC4tc si * nal * Anotocr uscful tyP*** function is Uie 



28 



WO 03/003905 



PCT/US02/21277 



5,706,402 



11 



12 



hyperbolic tangent function expressed as y=tanh(u). The 
hyperbolic tangent function is a member of the general da« 
of functions g(x) each representing a solution to the partial 
differential equation. 



(Eqn. 8] 



with a boundary condition of g(0>O.The parameter r should 
be selected appropriately for the assumed kurtosis of the 
input probability distribution. Far kurtosis above 3. either 
the bypcrbolie tangent function <r»2) or the noo-membcr 
logistic transfer function is well suited for the process of this 
invention. 

For the logistic transfer function (Eqn. 7). the terms in 
Eqn, 6 can be expressed as: 



a 

"ST 



(+)■ 



xi-/xn-««a-?y» 



[Bqn. 9) 



[Bqs. 10] 



Dividing Eqn. 10 by Eqn. 9 produces a scaling measure 
Aw for the scaling weight learning rule of this invention 
based on the logistic function: 



If the hyperbolic tangent sigmoid function is used, the 
bias measure Aw 0 then becomes proportional to -2y and me 
scaling measure Aw becomes p r op o r t i onal to — 2xy+w -1 , 
such that Aw<p-2ye and Aw=*(-2xy+W l ), where c is the 

5 learning rate. These learning rules offer the same general 
features and advantages of the learning rules discussed 
above in connection with Eqns. 10-11 for the logistic 
transfer function. In general, any sigmoid function in the 
class of solutions to Eqn. 8 selected for parametric suitability 

10 to a particular input probability distribution can be used in 
accordance with the process of mis invention to solve the 
blind signal processing problem. These unexpectedly advan- 
tageous learning rules can be generalized to the multi- 
dimensional case. 

is Joint Entropy Maximization for Multiple Sources 

To appreciate the multiple-signal blind processing method 
of this invention, consider the general network diagram 
shown in PIG. 2A where the measured input signal vector 
[X] is traiisformed by way of me weight matrix [Wl to 

20 produce a monotonically transformed output vector [Y>=g( 
[WlfX)4{W 0 ]). By analogy to Eqn. 3, the niultivariate 
probability density function of [Y] can be expressed as 



IBP.13J 



[Bqp. 11] 



where e>0 U a learning rate. 

Similar reasoning leads to a bias measure Aw 0 for the bias 
weight learning rule of this invention based on the logistic 
transfer function, expressed as: 



30 



where Ul is the absolute value of the Jaooblan of the 
transformation that produces output vector [Y] from input 
vector |X1. As is well-known in the art the Jacobian is the 
determinant of the matrix of partial derivatives: 



[Eqn. 12) 



These two learning rules (Eqns. 11-12) are implemented 
by adjusting the respective w or w 0 at a "learning rate" (e), 33 
which is usually less than one percent (t<0.01), as is known 
in the neural network arts. Referring to FIGS. 1A-1C, if the 
input probability density function f ~(x) is Gaussian, then the 
bias measure Aw 0 operates to align the steepest part of the 
sigmoid curve g(x) with the peak x of f^x), thereby match- *o 
log input density to output slope in the manner suggested 
intuitively by Eqn. 3. Tbe scaling measure Aw operates to 
align the edges of the sigmoid curve slope to the particular 
width (proportional to variance) of fjx). Thus, narrow 
probability density functions lead to sharply-sloping sag- *5 
moid functions. 

The scaling measure of Eqn. 1 1 defines an "anti-Hebbian" 
learning rule with a second "anti-decay** term. The first 
anti-Hebbian term prevents the unMormative solutions 
where output signal y saturates at 0 or 1 but such an so 
unassisted anti-Hebbian rule alone allows the slope w to 
disappear at zero. The second anti-decay term (1/w) forces 
output signal y away from the other unMormative situation 
where slope w is so flat that output signal y stabilizes at 0 J 
(FIG. 1A). 53 

The effect of these two balanced effects Is to produce an 
output probability density function fjy) mat is close to the 
fiat unit distribution function, which is known to be the 
maximum entropy distribution for a random variable 
bounded between 0 and 1. FIG. IB shows a family of 60 
sigmoid output distributions, with the most informative one 
occurring at sigmoid slope w^ Using the logistic transfer 
function as the non-linear sigmoid trnnsformation, the learn- 
ing rule in Eqn. 11 eventually brings the slope w to w^ 
thereby maxirnizing entropy in output signal y. The bias rule 65 
in Eqn. 12 centers the mode in the sloping region at w 0 

(no. ia). 



/«<fcr 



ay* 



-£7 



[Bqo.14] 



where deuM denotes the determinant of a square matrix. 

By a n a lo gy to the single-input case discussed above, the 
method of this invention maximizes the natural log of the 
Jacobian to maximize output entropy H(Y) for a given input 
entropy H(X), as can be appreciated with reference to Eqn. 
5. The quantity InUI represents the volume of space in (V] 
into which points in [X] are mapped. Maximizing this 
quantity attempts to spread the training set of input points 
evenly [VJ. 

Per the commonly-used logistic transfer function, the 
resulting learning rules can be proven to be as follows: 



(AWH^aiMnMXlMlwi'r') 



IB<p. 16] 



In Eqn. IS, the first anti-Hebbian term has become an 
outer product of vectors and the second anti-decay term has 
generalized to an "anti-redundancy" term in the form of the 
inverse of the transpose of the weight matrix |WJ. Eqn. 15 
can be written, for an individual weight W tf as follows: 



. "1 



where coflWJ denotes the cefaclor of element W r which 
is known to be (-l) w times the determinant of the matrix 
obtained by removing the i* row and the j A column from the 
square weight matrix [W] and e is the learning rate. 
Similarly, the I 1 * bias measure AW„ can be expressed as 
follows: 
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The rules shown in Eqns. 17-18 are the tune as those for 
the single unit mapping (Eqns. 11-12) except that the 
instability occurs at detlWl-0 instead of w=0. Thus, any 
degenerate weight matrix leads to instability because any 
weight matrix having a zero determinant is degenerate. This 
fact enables different outputs Y, to learn to represent differ- 
ent things about the inputs X r When the weight vectors 
entering two different outputs become too similar. det[W) 
becomes small and the natural learning process forces these 
approaching weight vectors apart This effect is mediated by 
the numerator cof(W A which approaches zero to indicate 
degeneracy in the weight matrix of the rest of the layer not 
associated with input X, or output Y,. 

Other tigmoidal transformations yield other training rules 
that arc similarly advantageous as discussed above in con- 
nection with Eqn. 8. For instance, the hyperbolic tangent 
function yields rules very similar to those of Eqns. 17-18. 



the Jacobian of the Eqn. 22 tranaformation according to Eqn. 
13. The ensemble can be •'created" from a single time series 
by breaking the series into sequences of length L which 
reduces (W) In Eqn. 23 to an W lower triangular matrix. The 
Jacobian of the transformation is then written as follows: 



which may be decomposed into the determinant of the 
weight matrix [W] of Eqn. 23 and the product of the slopes 
of the sigmoidal squashing function for all times L Because 
[W] is lower-triangular, its determinant is merely the product 
of the diagonal values, which is W/. As before, the output 
signal entropy H(Y) is maximized by maximizing the loga- 
rithm of the Jacobian, which may be written as: 



Atu i ggi»*i 



.191 
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If the hyperbolic tangent is selected as the non-linear 
sigmoid function* then differentiation with respect to the 
fiher weights W(t) provides the following two simple learn- 
ing rules: 

[Bp. 26J 



The usefulness of meae blind source separation network 
learning rules can be appreciated with reference to the 
discussion below in connection with FIG. 5. 
Blind Deconvolution in a Causal Filter 

FIGS. 2B-2C illustrate the blind deconvolution problem. ^ 
FIG. 2C shows an unobserved data sequence S(t) entering an 
unknown channel A(t). which responslvely produces the 
measured signal X(t) that can be blindly equalized through 
a causal filter W(t) to produce an output signal U(t) approxi- 



(Bq». 271 



6 Wi * t • I i-Vb+iYjU whew I > I 

In Eqns. 26-27, W, is the "leading weight* and 

W#=*2 I) represent the remaining weights in a delay 

line having I weighted taps linking the input signal sample 
to the output signal sample Y f The leading weight W, 
therefore adapts like a weight connected to a neuron with 



a causal filter W(t) to proauce an output uguu uw ajyiw**- tnereiorc aoapis uax a wcigm iimuc**w • 

mating the original unobserved data sequence S(t). FIG. 2B J5 oftly ^ onc ^pnt (Eqn. 11 above). The other tap weights 

shows the dine series X(t), which is presumed to have a {wj attempt to decorxelate the past input from the present 

length of J samples (not shown). X(t) is convolved with a Thus, the leading weight W, keeps the causal filter 

causal filter having I weighted taps, W,.... ,W, and impulse from "shrinking". 

response W(t). The causal filter output signal U(t) is then Other sigmoidal functions may be used to generate simi- 

pasied through a noo-linear sigmoid function g0) to create w lariy usc ful learning rules, as discussed above in connection 

!ZT * i_2.. „: — i v/*\ /n«v» cknu/nv Thlc cvciem can be ..~»*» Con & TH*. *»nnivnlrnt rules for the loeistic transfer 



the training signal Y(t) (not shown). This system can be 
expressed either as a convolution (Eqn. 21) or as a matrix 
equation (Eqn. 22) as follows: 



TOM W) * X(f)) 

[ytlClWUX]) 



[Eqa. 22) 



with Eqn. 8. The equivalent rules for the logistic transfer 
function discussed above can be easily deduced to be: 



45 



AWi " Ji("vr +JfX1 " 21>) ) 



in which [Y)=g(lU)) «nd [XI are signal sample vectors 
having J samples. Of course, the vector ordering need not be 
temporaL For causal filtering, [W] is a banded lower trian- 
gular JxJ square matrix expressed as: 

[Eqn. 231 



"A 



I>1 



(&vl29| 



o ... w, w, 
0 ... 0 w t 
Assuming an ensemble of time series, the joint probability 
distribution functions f fyl ([Yl) and t lX f\X\) are related by 



The usefulness of these causal filter learning rules can be 

so appreciated with reference to the discussion below in con- 
nection with FIGS. 6 and 7. 
Information Maximization v. Statistical Dependence 

The process of this invention relies on the unexpectedly 
advantageous observation that, under certain conditions, the 

Si maximizat ion of the mutual information I(Y,X) operates to 
minimize the mutual information between separate outputs 
f UJ in a multiple source network, thereby performing the 
redundancy reduction required to solve the blind signal 
processing problem. The usefulness of this relationship was 

60 unsuspected until now. When limited to the usual logistic 
transfer or hyperbolic tangent sigmoid functions, this inven- 
tion appears to be limited to the general class of super- 
Gaussian signals having kurtosis greater than 3. This limi- 
tation can be understood by considering the following 

65 example shown in FIGS. 3A-3C 

Referring to FIG. 3 A, consider a network with two 
outputs y | and Y a , which may be either two output channels 
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from a blind source separation network or two signal 
samples at different rimes far a blind deconvolutioo network. 
The joint entropy of these two variables can be written as: 



aanffl)- 



(Bp. 30] 5 



16 

-continued 
4-1 for r t >Q 

0 for rj » o 

-lfor r,<o 



Thus, the joint entropy can be maximized by maximizing 
the individual entropies while T^n^tring the mutual Infor- 
mation I(y lt y 3 ) shared between the two. When the mutual 
infonnauoD ICy 4 ,y 2 ) is zero, the two variables y, and y a are 
statistically independent and the joint probability density 
function is equal to the product of the individual probability 
density functions so that f w (y|.y a K^(y 1 )^(ya). Both the 
ICA and the 4 *whitening approach to deconvolution are 
examples of pair-wise minimization of mutual information 
ICy^ya) for all pairs y, and y 2 . This process is variously 
denominated factorial code learning, predictability 
minimization, independent component analysis ICA and 
redundancy reduction. 

The process of this invention is a stochastic gradient 
ascent procedure mat maximizes the joint entropy Hly^, 
thereby differing sharply from these * Vhitcning" and ICA 
procedures known for minimizing mutual information I(y,. 
yj. The system of this invention rests on the unexpectedly 
advantageous discovery of the general conditions under 
which maximizing joint entropy operates to reduce mutual 
information (redundancy), thereby reducing the statistical 
dependence of the two outputs y t and Y 2 . 

Under many conditions, rnaximizing joint entropy H(y,. 
Vj) does not guarantee minimization of mutual information 
I(yi,ya) because of interference from the other single 
entropy terms K(yJ in Eqn. 30. FIG. 3C shows one patho- 
logical example where a "diagonal** projection of two 
independent, unif ormly-di stributed variables x, and x 2 is 
preferred over the "independent'* projection shown in FIG. 
3B when joint entropy is maximized. This occurs because of 
a mismatch between the requisite alignment of input prob- 
ability distribution function and sigmoid slope discussed 
above in connection with FIGS. 1A-1C and Eqn. 8. The 
learning procedure of this invention achieves the higher 
value of mutual entropy shown in FIG. 3C than the desired 
value shown in FIG. 3B because of the higher individual 
output entropy values H(y,) arising from the triangular 
probability distribution functions of (Xj+Xj) and (Xj-Xj) of 
FIG. 3G which more closely match the sigmoid slope (not 
shown), This interferes with the minimization of mutual 
information Ifo^a) because the individual entropy H(y / ) 
increases offset or mask undesired increases in mutual 
information to provide the higher joint entropy HOvyJ 
sought by the process. 

The inventor believes that such interference has little 
significant effect in most practical situations, however. As 
mentioned above in connection with Eqn. 8, the sigmoids] 
function is not limited to the usual two functions and indeed 
can be tailored to the particular class of probability distri- 
bution functions expected by the process of this invention. 
Any function that is a member of the class of solutions to the 
partial differential Eqn. 8 provides a sigmoidaJ function 
suitable for use with the process of mis invention. It can be 
shown that this general class of slgmoidal functions leads to 
the following two learning rules according to this invention: 



and where parameter r is chosen appropriately for (he 
presumed kurtosis of the probability distribution function of 
the source signals [SJ. This formalism can be extended to 
covered skewed and multimodal input distribution by 
extending Eqn. 8 to produce an increasingly complex poly- 
nomial in g(x) such that 

»<**<*)). 

15 

Even with the usual logistic transfer function (Eqn. 7) and 
the hyperbolic tangent function (r=2), it appears that the 
problem of individual entropy interference is limited to 
sub-Oaussian probability distribution functions having a 

20 kurtosis less than 3. Advantageously, many actual analog 
signals, including the speech signals used in the experimen- 
tal verification of the system of this invention, are super- 
Gaussian in distribution. They have longer tails and are more 
sharply peaked than the Oaussian distribution, as may be 

25 appreciated with reference to the three distribution functions 
shown in FIGS. «A-*C FIG. 8A shows a typical speech 
probability distribution function, FIG. 8B shows the prob- 
ability distribution function for rock music and FIG. 8C 
shows a typical Gaussian white noise distribution. The 

30 inventor has found that joint entropy maximization for 
sigmoidal networks always minimizes the mutual informa- 
tion between the network outputs for all super-Gaussian 
signal distributions tested. Special sigmoid functions can be 
selected that are suitable for accomplishing the same result 

35 for sub-Gaussian signal distributions as well, although the 
precise learning rules must be selected in accordance with 
the parametric learning rules of Eqns. 31-32. 

Different sigmoid non-linearities provide different anti- 
Hebbian terms. Table 1 provides the anti-Hebbian terms 

40 from the learning rules resulting from several interesting 
non-linear transformation functions. The inf ocrruti on - 
maximization rule consists of an anti-redundancy term 
which always has a farm of KW] 7 ^ 1 and an anti-Hebbian 
term that keeps the unit from saturating. 

45 
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TABLE 1 


Function: 


Slope: 


Afili Hefcb tena: 


K -*(«») 
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Bp. 8 
ntutkm 
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Table 1 shows that only the Eqn. 8 solutions (Including 
the hyperbolic tangent function for r=a) and the logistic 
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transfer functions produce anti-Hebc-ian terms mat cm yield 

highcr-orda statistics. The other functions use the net input wrt-.fi/jtt)-, ( x X -(/-!+ IW ) 

u, as the output variable rather using the actual transformed A ' *V*»i*=i / 

output y r Tests performed by the inventor «how thai erf ^ seIccted mmifcr function. 

function Is unsuitable for blind separation. In fact, stable S jj mc hyirbolic tangent function is selected as the sigmoi- 

wdght matrices using the -2x^ can becaloilaul from the nQ ^ ugjl ^ me f 0 Uowing tndning rules ait used in the 

covariance matrix of the inputs alone. The learning rule for ™ iveotion: 

a Gaussian radial basis function node is interesting because 5y5lcm OI " 

it contains u, in both me numerator and denominator. The ^ 

denominator term limits the usefulness of such a rule iQ AV(pB< . ( -^Lpy- - 2X*Y, ) 

because data points near me radial basis function center v ' 

would cause instability. Radial transfer functions are gen- B 4 . ( . rx^y,) when i > i (Bp. 351 

^HetWork according to this indention. Each of the four multipath distortion that requires blu d deconvolunoo as 

Zt S pS«t?C«- output signals such as well as ad unknown mixture of up to three unknown source 

mfcleS a&LEved from a microphone at . "cock- signal. {%}. l of the source s^.non pls^ exe^ 

Z m7 "Ti output signal Each of the four plified by plane 24. operates substantially as duoissed 

ZSSmJSSb «.} ^ touted to the four input » above In connection with FIG. 5 for the three input stgnals 

dg^byTi*?* thil UW1X.WWJ. The four by providing three output contribution, to the um- 

biaTweiehts {wT} are updated regularly according to the ming element, exemplified by suinmmg arcmt 26. PUne 24 

SSfc flV 18 dlscus*d labove and cacb of the contains the lead wdghta for the 16 individual causal fillers 

siWscalingwdghts {W„> are updated regularly accord- formed by the network. Preliminary "^"^^ 

mgto the leJninglule of Eqn. 17 discussed above. These a by the inventor with speech "goals so which signals «« 

Zates can occur after evo? signal sample or may be simultaneously separated and deconvolved using the learn- 

£«m^ed ov^tnauy signal Zples fa updating In a ing rule discussed above resulted in recovery of apparently 

global mode. Each of the weight element, in FIG. 5 exem- perfect speech. 

plified by element 18 include, the logic necessary to produce Experimental Results 

kndaccumulate the AW update according to the applicable 30 The inventor conducted experiments using three-second 

segments of speech recorded from various speakers with 

TheKparaUon network in FIG. 5 can also be used to only one speaker per recording. All speech segments were 

renUe interfering signals torn. receive signal merely by. sampled at 8.000 Hi from the output of the auxiliary 

Srewmple. isolating the interferes as output signal U, and microphone of a Sparc-10 workstation. No sped^ post- 

thensubttacting V, from the receive signal of interest, such 35 processing was performed on the waveforms other than the 

a, receive signal X, In such a configuration, the network normalization of amplitudes to a common taovjll-33] u> 

shown In Fia 5 is herein denominated a "interference permit operation with the equipment used. The^ork was 

cawelltog-nrtwork. trained using the stochastic gradient ascent procedure of this 

FIG. 6 shows a functional block diagram illustrating a invention, 

simple causal filter opo-sled according to the method of this <o Unsupervised learning In a neural network may prooeed 

invention fa blind decoovolution. A time-varying signal Is either continuously or in a global mode. Continuous learning 

l«iented to the newcrk at input 22. The five spaced taps consists in slightly modifying the weight, ata^propa- 

| Tl \ _ separated by a time-delay interval tin the manner gation of an input vector through the network. This land ot 

weU-knowTin the art far transversal filters. The five weight learning is useful for signal, that arrive in real time or when 

factors (W,y are established and updated by internal logic « local storage capacity is restricted. In a global learning 

n«"own) according to the learning rale, shown In Etms. mode, a multiplicity of samples are propagated through the 

The five weighted up signal, {U,> network and the results «c«d ^J^" ™J™. 

„ summed at a summation device 24 to produce the angle puted exactly on these data and the wefchts are modified 

ti^S output signal U, Because input signal X, only after accumulating and processing the multiphaty of 

Include, an unknown non-linear combination of time- JO signal samples. 

delayed versions of an unknown source signal S„ the system To reduce cwnputational f^"**"^?^^ 

ofthT invention adjusts the tap weights {W,} such that were performed using the global learning mode. To ensure 

t£ZZ nTE.nSxiin.te. tte unknown source signal S f that the input ensemble U stationary in time, random point, 

So ^shows afunctional block diagram ulustratfng the were selected from the three-second window to generate the 

J^ZZmmS sepamtiof network andblind « ^^""t^^^S 

decoovolution filter system, of this invention. The blind with 0.005 preferred. As used heron, learning rate < ertab- 

seoaratioo learning rales and the Mind deconvolution rales lishes the actual weight adjustment such that W^Wyt 

abovecL be easUy combined In the form exem- eAW^. as is known In the art. The inventor found to* 

piffled by TO. 7. The objective is to maximize the natural reducing the learning rate over the learning process was 

logarithm of a Jacobian with local lower triangular structure, M useful. _.„_^,„ 

which yields the expected learning rule that forces the Blind Separation Results: The network architecture 

leading weights (W^TiTthe filters to follow the blind shown in FIGS. 2Aand 5 together with the learning rales in 

separation rales and all others to follow a decorrelation rale Eqns. 17-18 were found to be sufficient to perform blind 

JTLt ih«t tanned weinhts (W,„) are interposed between a separation of at least seven unknown source signals. A 

2^p!S"oCt « raXs mixing matrix (A] was general with values usu- 

The cutouts {U,} are used to produce a set of training ally in the interval 1-1,11- The mixing matrix (A) was used 

tignaagi^nby E,n.33: to generate the several mixed time senes (X,) from the 
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original sources [S,l The unmixing matrix (W] and the bias 
vector (WJ were then trained according to the rules in Eqns. 
17-18. t a 

FIG. M shows the results of toe attempted separation of 
five source signals. The mixtures [X,] formed an incompre- 
hensible babbie that could not be penetrated by the human 
ear. The unmixed solutions shown as [Y ( ] were obtained 
after presenting about 500*000 time samples, equivalent to 
20 passes through the complete three-second series. Any 
residual interference io the output vector elements [YJ is 
inaudible to the human ear. This can be appreciated with 
reference to the permutation structure of the product of the 
final weight matrix [W] and the initial mixing matrix (A): 



TO- 
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As can be seen, the residual interference factors are only 
a few percent of the single substantia] entry in each row and 
column, thereby demonstrating that weight matrix [Wl 
substantially removes all effects of mixing matrix [A] from 
the signals. 

In a second experiment seven source signals, including 
five speaking voices, a rock music selection and white noise, 
were successfully separated, although the separation was 
still slowly improving after 2.5 million iterations, equivalent 
to 100 passes through the three-second data. For two 
sources, convergence is normally achieved in less than one 
pass through the three seconds of data by the system of mis 
invention. 

The blind separation procedure of this invention was 
found to fail only when: (a) more than one unknown source 
is Gaussian white noise, and (b) when the mixing matrix [A] 
is nearly singular. Bom weaknesses are understandable 
because no procedure can separate independent Gaussian 
sources and, if [A] is nearly singular, then any proper 



The first whitening example shows what happens when 
"deconvolving" a speech signal that has not been corrupted 
(convolving filter [A] is a delta-function). If the tap spacing 
is close enough, as in this case where the tap spacing is 
s identical to the sample internal, the process of this invention 
learns the whitening filter shown in PIG. UCmat flattens the 
amplitude spectrum of the speech up to the Nyquist limit 
(equivalent to half of the sampling frequency). FIG. 9A 
shows the spectrum of the speech sequence before decon- 
10 volution and FIG. 9B shows the speech spectrum after 
deconvolution by the filter shown in FIG. 11C Whitened 
speech sounds like a clear sharp version of the original 
signal because the phase structure is preserved. By using all 
available frequency levels equally, the system is maximizing 
15 information throughput in the channel Thus, when the 
original signal is not white, the deconvolving filter of this 
invention will recover a whitened version of it rather than 
the exact original However, when the filter taps are spaced 
further apart, as in FIGS. 11B-11L there is less opportunity 
20 for simple whitening. 

In the second "barrel-effect" example shown in FIG. 11E, 
a 6.25 ms echo is added to the speech signal This creates a 
mild audible barrel effect Because filter HE is finite in 
length, its inverse is infinite in length but is shown in FIG. 
25 HF as truncated. The inverting filter learned in FIG. 110 
resembles FIG. 11F although the resemblance tails off 
toward the left side because the process of this invention 
actually learns an optimal filter of finite length instead of a 
truncated infinite optimal filter. The resulting decon volution 
30 shown in FIG. 11H is very good. 

The best results from the blind deconvolution process of 
this invention are seen when the ideal deconvolving filter is 
of finite length, as in me third example shown In FIGS. 
11H1L. FIG. Ill shows a set of exponentially-decaying 
35 echoes spread out over 275 ms that may be inverted by a 
two-point filter shown in FIG. HI with a small decaying 
correction on the left, which is an artifact of the truncation 
of the convolving filter shown in FIG. 11L As seen in FIG. 
UK. the learned filter corresponds almost exactly to the 



solution. 

In contrast with these results, experience with similar tests 
of the HI network shows it occasionally fails to converge for 
two sources and rarely converges for three sources. 

Blind Deconvolution Results: Speech signals were con- 
volved with various filters and the learning rules in Eqns. 
26-27 were used to perform blind deconvolution. Some 
results are shown in FIGS. 11A-11L. The convolving filter 



blind processing method of this invention 1 
tap-spacing is great enough (100 sample intervals) that 
simple whitening cannot interfere noticeably with the decon- 
volution process. 

Clearly, other embodiments and modifications of this 
invention may occur readily to those of ordinary skill in the 
art in view of these teachings. Therefore, this invention is to 
be limited only by the following claims, which include all 



time domains shown in FIGS. 11A. 11E and 111. contained » «eh embodiment! and nwdifications when viewed in coo 
«me zero values. For example, FIG. 11E represents the j^on with the above specification and accompanying 
filter [0.8.0X1.0,1 1. Moreover, the taps were sometimes adja- drawing, 
cent to each other, as in FIGS. 11A-11D. and sometimes Idaim: 

Slced £afl rime. a. in FIGS. 11M1L. Tta leading L A method performed ma neural network having input 
weight oTeTcfc filter is the right-met bar in each histogram, 55 means for living, plurality of mput dgnah (^) and 
Siedby bar 30 in FIG. Ill and bar 32 in FIG. 110. output means for producing a plurality I of output signals 

mKt ^wTfoUowed by those of the ideal decon- ' ™ — < iffMjs 

voiYing filter IW^^J, those of the fitter produced by the 
process of this invention [W] and the time domain pattern 
produced by convolution of |Wj and (A). Ideally, the coo- 
volution IW1»(A) should be a delta-function consisting of 65 
only a single high value at the right-most position of the 
leading weight when [Wl correctly inverts [A). 



information redundancy among said output signals (U,), 
wherein (XiSI>l and 0<j£J>l are integers, said method 
comprising: 

(a) selecting initial values for said bias weights (W 0 ) and 
said scaling weights (W tf ); 

(b) producing a plurality I of training signals (Y f ) respon- 
sive to a transformation of said input signals (X,) such 
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that Y^gfU,), wherein g(x) is & nonlinear ftJfl^md 
the Jacobian of said tranrformaiion is J^oeqcfrydX,) 
when J=l; and 
(c) adjusting said bias weights (WJ and said scaling 
weights (W„) responsive to one or more samples of said 
training signals (VJ such that each said bias weight 
Wi„ is changed proportionately to a corresponding bias 
measure AW B accumulated over said one or more 
samples and each said scaling weight W» is changed 
proportionately to a corresponding scaling measure 
AW^-dClnUiydWy accumulated over said one or 
more samples, wherein e>0 is a learning rate. 
2. The method of claim 1 wherein said nonlinear function 

g(x) is a nonlinear function selected from a group consisting 

essentially of the solutions to the equation 

-IrsW-i-W*)' 



to 



15 



essentially of the solutions to the equation 

and said AW JO =e (-rX/i"^ l sgn(Y < )) accumulated over said 
one or more samples and each said scaling weight W v is 
changed proportionately to a corresponding scaling measure 
AW^CcofCW^ydetCW^H^IYr 1 8CCUmu " 
lateo over said one or more samples. 

6. The method of claim 4 wherein said nonlinear function 
g(x) is a nonlinear function selected from a group consisting 
essentially of g,(x)=tanh(x) and g^WHT 1 and said 
adjusting comprises: 
(c) adjusting said bias weights (W c ) and said scaling 
weights (Wy) responsive to one or more samples of said 
training signals (YJ such chat each said bias weight 
is changed proportionately to a corresponding bias 



and said AW^^r'sgnOW accumulated ova said one „ 
or more samples and each said scaling weight W„ is changed 
proportionately to a corresponding scaling measure AW^» 
t«cof(W v ydet(W tf )).rX/rT l •g»W)) accumulated over 
said one or more samples. 

3. The method of daim 1 wherein said nonlinear function 23 
g(x) is a nonlinear function selected from a group consisting 
essentially of gi (x>tanh(x) and g^M^T 1 and said 
AW A selected from the group consisting essentially of 
bjW^i-lYZ and A^^d^Y,) accumulated ova 
said one or more samples and each said scaling weight W v M 
Is changed proportionately to the a corresponding scaling 
measure AW y selected from the group consisting essentially 
of A.W^.((cof(W v ydet(W tf ))-2X y Y ( ) and 6*Wft<(oof 
(WyydetXW^HX/l-XY,)) accumulated ova said one or 
more samples. 

4. A neural-network implemented method for recovering 
one or more of a plurality I of independent source signals 
(S,) from a plurality J>I of sensor signals (Xj) each including 
a combination of at least some of said source signals (S<) 
wherein 0<i<I>l and 0<j*J>I are integers, said method 
com prising* 

(a) selecting a plurality I of bias weights (W a ) and a 
plurality I* of scaling weights <W V ); 

(b) adjusting said bias weights (W*) and said scaling 
weights (W v ) by repeatedly performing the steps of: 
(b.l) producing a plurality I of estimation signals (U<) 

responsive to said sensor signals (X,) such that 

(U,M W </X>9+< W »). . , 

(b.2) producing a plurality I of training signals (YJ 
responsive to a transformation of said sensor signals 50 
(XJ such mat Yf*tfVb wherein g(x) is a nonlinear 
function and the Jacobian of said transformation is 
Je^3Y/aX,) when W, and 
(b3) adjusting each said bias weight and each said 
scaling weight W„ responsive to one or more 
samples of said training signals OQ such that said 
each bias weight W» is changed proportionately to a 
bias measure AW C accumulated ova said one or 
more samples and said each scaling weight W v is 
changed proportionately to a corresponding scaling 60 
measure AW^cXlnlJiyaw., accumulated ova said 
one or more samples, wherein oOisa learning rate: 
and 

(c) producing said estimation signals (U,) to represent said 
one or more recovered source signals (S J. « 

5. The meth od of claim 4 wherein said nonlinear function 
g(x) is a nonlinear function selected from & group consisting 
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measure AW 0 selected from the group consisting 
essentially of A^W) and A»WflPe(l-2Y f ) 
accumulated ova said one or more samples and each 
said scaling weight W v is changed proportionately to 
the a corresponding scaling measure AW^ selected 
from the group consisting essentially of A,W ( ^A<(cof 
(W tf )/det(W<,))-2X,Y,) and A 2 W^e4(eof(W v )/det 
(W V ))*X/1-2Y ( )) accumulated ova said one or more 
samples. 

7. A method implemented in a transversal filter having an 
input for receiving a sensor signal X that includes a com- 
bination of multipam reverberations of a source signal S and 
having a plurality I of delay line tap output signals <T<) 
distributed at intervals of one or more time delays T. said 
source signal S and said sensor signal X varying with time 
ova a plurality J SI off said time delay intervals t such that 
said sensor signal X has a value X, at time t(H) CAch 
said delay line tap output signal T, has a value X^.^ 
representing said sensor signal value X, delayed by a time 
interval t(i-l). whaein t>0 is a piedaermined constant and 
<kl*l>l and 0<j£J£I are integers, said method recovering 
said source signal S from said sensor signal X and compris- 
ing: 

(a) selecting a plurality I of filter weights (W,); 

(b) adjusting said filta weights (W,) by repeatedly pa- 
forming the steps of 

(b. 1) producing & plurality K=I of weighted tap output 
signals (V*) by combining said delay line tap output 
signals <T,) such that (V^MF*) <T<), wherein 
0ck$K=I>l arc integers, and whaein F^W^^ 
when l^k+l-iSI and F***) otherwise, 
(b.2) summing a plurality K=I of said weighted tap 
signals (V*) to produce an estimation signal 

wherein said estimation signal U has a value U, at time 
t(j~l) 

<bJ) producing a plurality J of training signals (Yj) 
responsive to a transformation of said sensor signal 
values such that YysgCty wherein g(x) is a 
nonlinear function and the Jacobian of said transfor- 
mation is J==det(3Y;oX) when J=L and 
(b.4) adjusting each said filter weight W ( responsive to 
one or more samples of said training signals (Y y ) 
such that said each filta weight W, is changed 
proportionately to a corresponding leading measure 
AW (accumulated over said one or more samples 
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when i=l and a corresponding scaling measure AW^ 
e dtlnUIJ/dW, accumulated over said one or more 
samples otherwise; and 
(c) producing said estimation signal U to represent said 

recovered source signal S. 
8. The method of claim 7 wherein said nonlinear function 
g(x) is a nonlinear function selected from a group consisting 
essentially of gj(x)stanh(x) and gjCXMl-O" 1 and said 
AW j selected from the group consisting essentially of 



A, W, = € • l x ( - Ujt, ) »d AlWi = 



■■JUtIt** 1 -**) 



accumulated over said one or more samples when t»l and a 
corresponding scaling measure AW, selected from the group 
consisting essentially of 



iX+iYj) tod AaWi = < • . t - J Q 



accumulated over said one or more samples otherwise. 

9. The method of claim 7 wherein said nonlinear function 
g(x) is a nonlinear function selected from a group consisting 
essentially of the solutions to the equation 



. iM»t-tiW«idi*kUHV 



10 



is 
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accumulated over said one or more samples when i=l and a 
corresponding scaling measure 

ac cumula ted over said one or more samples otherwise, 

10. A neural network for recovering a plurality of source 
signals from a plurality of mixtures of said source signals, 
said neural network comprising: 
input means for receiving a plurality J of input signals (X,) 
each including a combination of at least some of a 
plurality I of independent source signals (SJ, wherein 
0<iSl>l and 0<J3J£I are integers; 
weight means coupled to said input means for storing a 
plurality I of bias weights (W w ) and a plurality I 2 of 
scaling weights (W^); 
output means coupled to said weight means for producing 
a plurality I of output signals {IS J responsive to said 
input signals 09 such that (U^) (X,yKW»); 
training means coupled to said output means for produc- 
ing a plurality I of training signals (YJ responsive to a 
transformation of said input signals (X,) such that 

wherein g(x) is a nonlinear function and the Jacobian of said 

transformation is J-det<8Y/3X > ) when M; 
adjusting means coupled to said training means and said 
weight means for adjusting said bias weights <W«) and 
said scaling weights (W„) responsive to one or more 
samples of said training signals OO such that each said 
bias weight W c is changed proportionately to a corre- 
sponding bias measure AW n accumulated over said 
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one or more samples and each said scaling weight W v 
is changed proportionately to a corresponding scaling 
measure AW^=t-o\lnU1)/dWy accumulated over said 
one or more samples, wherein t>0 is a learning rate. 

11. The neural network of claim 10 wherein said nonlinear 
function g(x) is a nonlinear function selected from a group 
consisting essentially of the solutions to the equation 

s<*)=i-l|W 

and said bias measure AW,o==€-(-r1Y,l'" l sgn(Y<)) and said 
scaling measure AW^<(cof(W w ydett^ v )K^r l sgn 

12. The neural network of claim 10 wherein said nonlin- 
ear function g(x) is a nonlinear function selected from a 
group consisting essentially of g,(x)=tanh{x) and g^xHl- 
e~T l and said bias measure AW C is selected from a group 
consisting essentially of A,W JO »-2Y i and AjW^al-SY, and 
said scaling measure AW y is selected from a group consist- 
ing essentially of AW 4 W^cof(W^aet<W tf )>-X^Y, and 
AjW^coffW^/detOV^HX/WY,). 

13. A system for adaptively cancelling one or more 
interferer signals (S J comprising: 

input means for receiving a plurality J of input signals (X,) 
each including a combination of at least some of a 
plurality I of independent source signals (S*) that 
includes said one or more interferer signals (SJ. 
wherein (XiSIM, (KjSJSI and (KnSN^l are inte- 
gers; 

weight means coupled to said Input means for storing a 
plurality I of bias weights (WJ and a plurality I 3 of 
scaling weights (W tf ); 

output means coupled to said weight means for producing 
a plurality I of output signals (U<) responsive to said 
input signals (X,) such that (U,HWy) (X^KWJ; 

training means coupled to said output means for produc- 
ing a plurality I of training signals (Y J responsive to a 
transformation of said input signals (xj) such that 
Yr=gCU,), wherein g(x) is a nonlinear function and the 
Jacobian of said transformation is J«det(dY/dXj); 

adjusting means coupled to said training means and said 
weight means for adjusting said bias weights (W^) and 
said scaling weights (W„) responsive to one or more 
samples of said training signals (Yj) such that each said 
bias weight W A Is changed proportionately to a corre- 
sponding bias measure AW C accumulated ova said 
one or more samples and each said scaling weight W v 
is changed proportionately to a corresponding scaling 
measure AW^c cXlnUiydW^ accumulated over said 
one or mare samples, wherein o0 is a learning rate; 
and 

feedback means coupled to said output means and said 
input means for selecting one or more said output 
signals (UJ representing said one or more interferer 
signals (S„) for combination with said input signals 
(Xj\ thereby cancelling said interferer signals (S J. 

14. The system of claim 13 wherein said nonlinear 
function g(x) is a nonlinear function selected from a group 
consisting essentially of the solutions to the equation 

-4- !(*)■!- JsW 



65 and said bias measure AW <Q =e-(-f1Y l r^ i sgn(Y f )) and said 
scaling measure ^^{^(W^^W^X/rr' »gn 
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IS. The system of claim 13 wherein said nonlinear 
function g(x) is a nonlinear function selected from a group 
consisting essentially of gl (x)=tanh(x) and g^x^l-e"*^ 
and said bias measure AW A is selected from a group 
consisting essentially of A l W (0 =-2Y < and 1-2Y, and 
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said scaling measure AW» is selected from a group consist- 
ing essentially of * l W,f<cof(W i ,)/<k.t(W { }}-Xjrt l and 
AaW^cof^ydettW^HX^l-iV,). 
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1 . A medical system for separating electrocardiogram (EKG) signals, comprising: 

a receiving module configured to receive a plurality J of recorded EKG signals Xj 
from a plurality of EKG sensors; 
5 a computing module configured to separate the received signals using independent 

component analysis to produce a plurality I of separated signals Y*; and 

a display module configured to display the separated signals. 

2. The medical system of claim 1, wherein the display module is further configured to display 
at least a portion of the separated signals in a chaos phase space portrait. 

10 3. The medical system of claim 2, wherein the separated signals include three components of 
QRS complex, and wherein the display module is further configured to display at least the three 
QRS complex components in a chaos phase space portrait. 

4. The medical system of claim 1, wherein the computing module is configured to separate the 
recorded signals by multiplying the recorded signals by a matrix Wij such that Y { = * Xj. 
15 5. The medical system of claim 1, wherein the computing module is configured to separate the 
recorded signals using a neural-network implemented method, the method comprising: 

selecting a plurality I of bias weights W i0 and a plurality I* J of scaling weights Wy; 
adjusting the bias weights Wj 0 and the scaling weights Wy to minimize information 
redundancy among separated signals; and 
20 producing separated signals Yj such that Yj = Wij * Xj + W io . 

6. The medical system of claim 1, further comprising a database storing a plurality of EKG 
signal triggers and corresponding diagnosis, and a matching module configured to match the 
separated signals with one or more of the stored EKG signal triggers. 

7. A computer-implemented method of separating electrocardiogram (EKG) recording signals, 
25 the method comprising: 

receiving a first plurality of EKG recording signals from EKG sensors placed on a 

patient; 

separating the first plurality of EKG recording signals using independent 
component analysis to produce a second plurality of separated signals; and 
30 displaying the separated signals. 

8. The method of claim 7, further comprising displaying at least a portion of the separated 
signals in a chaos phase space portrait. 

9. The method of claim 7, wherein the patient is a pregnant patient, and wherein the separated 
signals include separated signals originating from the pregnant patient and separated signals 

35 originating from a fetus. 

10. The method of claim 7, wherein the displayed separated signals are used by a physician to 
determine the likelihood of arrhythmia in the patient. 
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11. The method of claim 7, wherein the displayed separated signals are used by a physician to 
determine the likelihood of myocardial infarction in the patient. 

12. The method of claim 7, wherein each of the separated signals corresponds to a location on 
the patient body, wherein the displayed separated signals are used by a physician to determine the 
location of an abnormal heart condition in the patient according to the separated signals' 
corresponding locations. 

13. A computer-assisted method of detecting arrhythmia in a patient, the method comprising: 

placing a first plurality of EKG sensors on a patient to produce a first plurality of 
channels of recorded EKG signals; 

sending the recorded signals to a computing module to separate the first plurality of 
EKG recorded signals into a first plurality of channels of separated signals using 
independent component analysis; and 

reviewing a display of the separated signals to determine the existence of 
arrhythmia in the patient. 

14. The method of claim 13, wherein reviewing a display of the separated signals comprises 
identifying a second set of one or more channels of separated signals that indicate arrhythmia, the 
method further comprising determining a probable location of arrhythmia according to the 
respective channel numbers of the second set of separated signals. 

15. The method of claim 1 3, wherein placing a first plurality of EKG sensors comprises placing 
a plurality of EKG sensors on more than 10 body surface locations of a patient's torso. 

1 6. The method of claim 1 3, wherein placing a first plurality of EKG sensors comprises placing 
a plurality of EKG sensors on more than 40 body surface locations of a patient's torso. 

17. A cardiac rhythm management system comprising: 

a cardiac signal recording module configured to record cardiac signals of a patient; 

a computing module configured to separate the recorded cardiac signals into 
separated signals using independent component analysis; 

a detection module configured to detect or predict an abnormal condition based on 
analyzing the separated cardiac signals; and 

a treatment module configured to treat the patient when the abnormal condition is 
detected or predicted. 

18. The cardiac rhythm management system of claim 17, wherein the detection module is 
configured to compare the separated signals with a plurality of stored triggers to determine whether 
the separated signals match a stored trigger. 

19. A cardiac rhythm management system comprising: 

a cardiac signal recording module configured to record cardiac signals of a patient; 
a computing module configured to separate the recorded cardiac signals into 
separated signals using independent component analysis; 
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a detection module configured to detect or predict an abnormal condition based on 
analyzing the separated cardiac signals; and 

a warning module configured to issue a warning when the abnormal condition is 
detected or predicted. 

20. The cardiac rhythm management system of claim 19, wherein the detection module is 
configured to compare the separated signals with a plurality of stored triggers to determine whether 
the separated signals match a stored trigger. 
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